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SYSTEM FOR THE IN VIVO DELIVERY AND 
EXPRESSION OF HETEROLOGOUS GENES IN 
THE BONE MARROW 

FEDERALLY SPONSORED RESEARCH 
This invention was made with Government support under Grant 
Number 5 ROl AI22186 from the National Institutes of Health. The Government 
has certain rights to this invention, 

FIELD OF THE INVENTION 
The present invention relates to recombinant DNA technology, and in 
particular to introducing and expressing foreign DNA in a eukaiyotic cell. 

BACKGROUND OF THE INVENTION 
The Alphavirus genus includes a variety of viruses all of which are 
members of the Togaviridae family. The alphaviruses include Eastern Equine 
Encephalitis virus (EEE), Venezuelan Equine Encephalitis virus (VEE), Everglades 
virus, Mucambo virus, Pixuna virus, Western Equine Encephalitis virus (WEE), 
Sindbis virus, South African Arbovirus No. 86 (S.A.AR 86), Girdwood S.A. 
virus, Ockelbo virus, Semliki Forest virus, Middelburg virus, Chikungunya virus, 
O'Nyong-Nyong virus, Ross River virus, Barmah Forest virus, Getah virus, 
Sagiyama virus, Bebaru virus, Mayaro virus, Una virus, Aura virus, Whataroa 
virus, Babanki virus, Kyzylagach virus, Highlands J virus, Fort Morgan virus, 
Ndumu virus, and Buggy Creek virus. 
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. The alphavirus genome is a single-stranded, messenger-sense RNA, 
modified at the 5'-end with a methylated cap, and at the 3 '-end with a variable- 
length poly (A) tract. The viral genome is divided into two regions: the first 
encodes the nonstructural or replicase proteins (nsPl-nsP4) and the second encodes 
5 the viral structural proteins. Strauss and Strauss, Microbiological Rev. 58, 491- 
562, 494 (1994). Structural subunits consisting of a single viral protein, C, 
associate with themselves and with the RNA genome in an icosahedral 
nucleocapsid. In the virion, the capsid is surrounded by a lipid envelope covered 
with a regular array of transmembranal protein spikes, each of which consists of 
10 a heterodimeric complex of two glycoproteins, El and E2. See Paredes et al., 
Proc. Natl. Acad. Sci. USA 90, 9095-99 (1993); Paredes etal., Virology 187, 324- 
32 (1993); Pedersen et al., J. Virol. 14:40 (1974). 

Sindbis virus, the prototype member of the alphavirus genus of the 
family Togaviridae, and viruses related to Sindbis are broadly distributed 

15 throughout Africa, Europe, Asia, the Indian subcontinent, and Australia, based on 
serological surveys of humans, domestic animals and wild birds. Kokemot et al., 
Trans. R. Soc. Trop Med. Hyg. 59, 553-62 (1965); Redaksie, S. Afr. Med. J. 42, 
197 (1968); Adekolu-JohnandFagbami, Trans. R. Soc. Trop. Med. Hyg. 77, 149- 
51 (1983); Darwish et al., Trans. R. Soc. Trop. Med. Hyg. 77, 442-45 (1983); 

20 Lundstrom et al., Epidemiol. Infect. 106, 567-74 (1991); Morrill et al., /. Trop. 
Med. Hyg. 94, 166-68 (1991). The first isolate of Sindbis virus (strain AR339) 
was recovered from a pool of Culex sp. mosquitoes collected in Sindbis, Egypt in 
1953 (Taylor et al., Am. J. Trop. Med. Hyg. 4, 844-62 (1955)), and is the most 
extensively studied representative of this group. Other members of the Sindbis 

25 group of alphaviruses include South African Arbovirus No. 86, Ockelbo82, and 
Girdwood S.A. These viruses are not strains of the Sindbis virus; they are related 
to Sindbis AR339, but they are more closely related to each other based on 
nucleotide sequence and serological comparisons. Lundstrom et al., /. Wildl. Dis. 
29, 189-95 (1993); Simpson et al., Virology 222, 464-69 (1996). Ockelbo82, 

30 S.A.AR86 and Girdwood S.A. are all associated with human disease, whereas 
Sindbis is not. The clinical symptoms of human infection with Ockelbo82, 
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S.A.AR86, or Girdwood S.A. are a febrile illness, general malaise, macropapular 
rash, and joint pain that occasionally progresses to a polyarthralgia sometimes 
lasting from a few months to a few years. 

The study of these viruses has led to the development of beneficial 
techniques for vaccinating against the alphavirus diseases, and other diseases 
through the use of alphavirus vectors for the introduction of foreign DNA. See 
United States Patent No. 5,185,440 to Davis et al., and PCT Publication WO 
92/10578. It is intended that all United States patent references be incorporated 
in their entirety by reference. 

It is well known that live, attenuated viral vaccines are among the 
most successful means of controlling viral disease. However, for some virus 
pathogens, immunization with a live virus strain may be either impractical or 
unsafe. One alternative strategy is the insertion of sequences encoding immunizing 
antigens of such agents into a vaccine strain of another virus. One such system 
utilizing a live VEE vector is described in United States Patent No. 5,505,947 to 
Johnston et al. 

Sindbis virus vaccines have been employed as viral carriers in virus 
constructs which express genes encoding immunizing antigens for other viruses. 
See United States Patent No. 5,217,879 to Huang et al. Huang et al. describes 
Sindbis infectious viral vectors. However, the reference does not describe the 
cDNA sequence of Girdwood S.A. and TR339, nor clones or viral vectors 
produced therefrom. 

Another such system is described by Hahn et al., Proc. Natl Acad. 
ScL USA 89:2679 (1992), wherein Sindbis virus constructs which express a 
truncated form of the influenza hemagglutinin protein are described. The 
constructs are used to study antigen processing and presentation in vitro and in 
mice. Although no infectious challenge dose is tested, it is also suggested that 
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such constructs might be used to produce protective B- and T-cell mediated 
immunity. 

London et al., Proc. Natl. Acad. Sci. USA 89, 207-11 (1992), 
disclose a method of producing an immune response in mice against a lethal Rift 
Valley Fever (RVF) virus by infecting the mice with an infectious Sindbis virus 
containing an RVF epitope. London does not disclose using Girdwood S.A. or 
TR339 to induce an immune response in animals. 

Viral carriers can also be used to introduce and express foreign 
DNA in eukaryotic cells. One goal of such techniques is to employ vectors that 
target expression to particular cells and/or tissues. A current approach has been 
to remove target cells from the body, culture them ex vivo, infect them with an 
expression vector, and then reintroduce them into the patient. 

PCT Publication No. WO 92/10578 to Garoff and Liljestrom 
provide a system for introducing and expressing foreign proteins in animal cells 
using alphaviruses. This reference discloses the use of Semliki Forest virus to 
introduce and express foreign proteins in animal cells. The use of Girdwood S.A. 
or TR339 is not discussed. Furthermore, this reference does not provide a method 
of targeting and introducing foreign DNA into specific cell or tissue types. 

Accordingly, there remains a need in the art for full-length cDNA 
clones of positive-strand RNA viruses, such as Girdwood S.A and TR339. In 
addition, there is an ongoing need in the art for improved vaccination strategies. 
Finally, there remains a need in the art for improved methods and nucleic acid 
sequences for delivering foreign DNA to target cells. 



SUMMARY OF THE INVENTION 
A first aspect of the present invention is a method of introducing 
and expressing heterologous RNA in bone marrow cells, comprising: (a) providing 
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a recombinant alphavirus, the alphavirus containing a heterologous RNA segment, 
the heterologous RNA segment comprising a promoter operable in bone marrow 
cells operatively associated with a heterologous RNA to be expressed in bone 
marrow cells; and then (b) contacting the recombinant alphavirus to the bone 
5 marrow cells so that the heterologous RNA segment is introduced and expressed 
therein. 

As a second aspect, the present invention provides a helper cell for 
expressing an infectious, propagation defective, Girdwood S.A. virus particle, 
comprising, in a Girdwood S. A.-permissive cell: (a) a first helper RNA encoding 

10 (i) at least one Girdwood S .A. structural protein, and (li) not encoding at least one 
other Girdwood S.A. structural protein; and (b) a second helper RNA separate 
from the first helper RNA, the second helper RNA (i) not encoding the at least one 
Girdwood S.A. structural protein encoded by the first helper RNA, and (ii) 
encoding the at least one other Girdwood S.A. structural protein not encoded by 

15 the first helper RNA, and with all of the Girdwood S.A. structural proteins 
encoded by the first and second helper RNAs assembling together into Girdwood 
S.A. particles in the cell containing the replicon RNA; and wherein the Girdwood 
S.A. packaging segment is deleted from at least the first helper RNA. 



A third aspect of the present invention is a method of making 
infectious, propagation defective, Girdwood S.A. virus particles, comprising: 
transfecting a Girdwood S.A. -permissive cell with a propagation defective replicon 
RNA, the replicon RNA including the Girdwood S.A. packaging segment and an 
inserted heterologous RNA; producing the Girdwood S.A. virus particles in the 
transfected cell; and then collecting the Girdwood S.A. virus particles from the 
cell. Also disclosed are infectious Girdwood S.A. RNAs, cDNAs encoding the 
same, infectious Girdwood S.A. virus particles, and pharmaceutical formulations 
thereof. 

As a fourth aspect, the present invention provides a helper cell for 
expressing an infectious, propagation defective, TR339 virus particle, comprising, 
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in a TR339-permissive cell: (a) a first helper RNA encoding (i) at least one TR339 
structural protein, and (ii) not encoding at least one other TR339 structural protein; 
and (b) a second helper RNA separate from the first helper RNA, the second 
helper RNA (i) not encoding the at least one TR339 structural protein encoded by 
the first helper RNA, and (ii) encoding the at least one other TR339 structural 
protein not encoded by the first helper RNA, and with all of the TR339 structural 
proteins encoded by the first and second helper RNAs assembling together into 
TR339 particles in the cell containing the replicon RNA; and wherein the TR339 
packaging segment is deleted from at least the first helper RNA. 



10 



A fifth aspect of the present invention is a method of making 
infectious, propagation defective, TR339 virus particles, comprising: transfecting 
a TR339-permissive cell with a propagation defective replicon RNA, the replicon 
RNA including the TR339 packaging segment and an inserted heterologous RNA; 
producing the TR339 virus particles in the transfected cell; and then collecting the 
15 TR339 virus particles from the cell. Also disclosed are infectious TR339 RNAs, 
cDNAs encoding the same, infectious TR339 virus particles, and pharmaceutical 
formulations thereof. 



As a sixth aspect, the present invention provides a recombinant 
DNA comprising a cDNA coding for .an infectious Girdwood S.A. virus RNA 
transcript, and a heterologous promoter positioned upstream from the cDNA and 
operatively associated therewith. The present invention also provides infectious 
RNA transcripts encoded by the above-mentioned cDNA and infectious viral 
particles containing the infectious RNA transcripts. 



As a seventh aspect, the present invention provides a recombinant 
DNA comprising a cDNA coding for a Sindbis strain TR339 RNA transcript, and 
a heterologous promoter positioned upstream from the cDNA and operatively 
associated therewith. The present invention also provides infectious RNA 
transcripts encoded by the above-mentioned cDNA and infectious viral particles 
containing the infectious RNA transcripts. 
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The foregoing and other aspects of the present invention are 
described in the detailed description set forth below. 

BRIEF DESCRIPTION OF T HE DRAWLS 
Figure 1 presents the cDNA sequence (SEQ ID NO:l) of 
S.A.AR86. The RNA sequence of the 5' 40 nucleotides was obtained by direct 
sequencing of the genomic RNA. The rest of the genome was sequenced by RT- 
PCR of fragments amplified from virion RNA. Nucleotides 1 through 59 
represent the 5' UTR, the non-structural polyprotein is encoded by nucleotides 60 
through 7559 (nsPl-nt60 through ntl679; nsP2-ntl680 through nt4099; nsP3- 
nt4100 through nt5729; nsP4-nt5730 through nt7559), the structural polyprotein 
is encoded by nucleotides 7608 through 1 1342 (capsid~nt7608 through nt8399; E3- 
-nt8400 through nt8591; E2-nt8592 through nt9860; 6K»nt9861 through ntl0025; 
El~ntl0026 through ntll342), and the 3' UTR is represented by nucleotides 
11346 through 11663. 

Figure 1A shows nucleotides 1 through 3800 of the cDNA sequence 

ofS.A.AR86. 

Figure IB shows nucleotides 3801 through 7900 of the cDNA 
sequence of S.A.AR86. 

Figure 1C shows nucleotides 7901 through 11663 of the cDNA 
sequence of S.A.AR86. 

Figure 2 presents the putative amino acid sequences of the 
S.A.AR86 polyproteins (SEQ DD NO:2 and SEQ ID NO:3). The amino acids 
were derived from the S.A.AR86 cDNA sequence given in Figure 1 (SEQ ID 
NO:l). 
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Figure 2A shows the amino acid sequence of the non-structural 
polyprotein of S.A.AR86 (SEQ ED NO:2). 

Figure 2B shows the amino acid sequence of the structural 
polyprotein of S.A.AR86 (SEQ ID NO:3). 

5 Figure 3 presents the cDNA sequence (SEQ ID NO:4) of Girdwood 

S.A. The RNA sequence of the 5' 40 nucleotides was obtained by direct 
sequencing of the genomic RNA. The rest of the genome sequence was obtained 
by sequencing of fragments amplified by RT-PCR from virion RNA. An "N" in 
the sequence indicates that the identity of the nucleotide at that position is 

10 unknown. Nucleotides 1 through 59 represent the 5' UTR, the non-structural 
polyprotein is encoded by nucleotides 60 through 7613 (nsPl-nt60 through 
ntl679; nsP2-ntl680 through nt4099; nsP3-nt4100 through nt5762 or nt5783; 
nsP4-nt5784 through nt7613), the structural polyprotein is encoded by nucleotides 
7662 through 11396 (capsid-nt7662 through nt8453; E3-nt8454 through nt8645; 

15 E2-nt8646 through nt9914, 6K-9915 through ntl0079; El-ntl0080 through 
ntll396), and the 3' UTR is represented by nucleotides 11400 through 11717. 
There is an opal termination codon at nucleotides 5763 through 5765. 

Figure 3A shows nucleotides 1 through 3800 of the cDNA sequence 
of Girdwood S.A. 

20 Figure 3B shows nucleotides 3801 through 7900 of the cDNA 

sequence of Girdwood S.A. 

Figure 3C shows nucleotides 7901 through 11717 of the cDNA 
sequence of Girdwood S.A. 

Figure 4 illustrates the putative amino acid sequences of the 
25 Girdwood S.A. polyproteins (SEQ ID NO:5 and SEQ ED NO:6). The amino 
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acids were derived from the Girdwood S.A. cDNA sequence given in Figure 3 
(SEQ ID NO:4). 

Figure 4A shows the amino acid sequence of the non-structural 
polyprotein of Girdwood S.A. The sequence terminates at the opal termination 
codon. The complete amino acid sequence is presented in SEQ ID NO:5. 

Figure 4B shows the amino acid sequence of the structural 
polyprotein of Girdwood S.A. (SEQ ID NO:6). 

Figure 5 illustrates the nucleotide sequence (SEQ ID NO:7) of 
clone pS55, a cDNA clone of the S.A.AR86 genomic RNA. 

Figure 5A shows nucleotides 1 through 6720 of the cDNA sequence 

ofpS55. 

Figure 5B shows nucleotides 6721 through 11663 of the cDNA 
sequence of pS55. 

Figure 6 presents the cDNA sequence (SEQ ID NO:8) of clone 
pTR339. The TR339 virus is derived from this clone. Nucleotides 1 through 59 
represent the 5' UTR, the non-structural polyprotein is encoded by nucleotides 60 
through 7598 (nsPl-nt60 through ntl679; nsP2-ntl680 through nt4099; nsP3- 
nt4100 through nt5747 or 5768; nsP4~nt5769 through nt7598), the structural 
polyprotein is encoded by nucleotides 7647 through 11381 (capsid~nt7647 through 
nt8438; E3-nt8439 through nt8630; E2~nt8631 through nt9899; 6K-nt9900 
through ntl0064; El-ntl0065 through ntll381), and the 3' UTR is represented 
by nucleotides 11382 through 11703. There is an opal termination codon at 
nucleotides 5748 through 5750. 

Figure 6A shows nucleotides 1 through 6720 of the cDNA sequence 

ofpTR339. 
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Figure 6B shows nucleotides 6721 through 11703 of the cDNA 
sequence of pTR339. 

DETAILED DESCRIPTION OF THE INVENTTQN 
The production and use of recombinant DNA, vectors, transformed 
host cells, selectable markers, proteins, and protein fragments by genetic 
engineering are well-known to those skilled in the art. See, e.g., United States 
Patent No. 4,761,371 to Bell et al. at Col. 6 line 3 to Col. 9 line 65; United States 
Patent No. 4,877, 729 to Clark et al. at Col. 4 line 38 to Col. 7 line 6; United 
States Patent No. 4,912,038 to Schilling at Col 3 line 26 to Col 14 line 12; and 
United States Patent No. 4,879,224 to Wallner at Col. 6 line 8 to Col. 8 line 59. 

The term "alphavirus" has its conventional meaning in the art, and 
includes the various species of alphaviruses such as Eastern Equine Encephalitis 
virus (EEE), Venezuelan Equine Encephalitis virus (VEE), Everglades virus, 
Mucambo virus, Pixuna virus, Western Encephalitis virus (WEE), Sindbis virus, 
South African Arbovirus No. 86, Girdwood S.A. virus, Ockelbo virus, Semliki 
Forest virus, Middelburg virus, Chikungunya virus, O'Nyong-Nyong virus, Ross 
River virus, Barmah Forest virus, Getah virus, Sagiyama virus, Bebaru virus, 
Mayaro virus, Una virus, Aura virus, Whataroa virus, Babanki virus, Kyzlagach 
virus, Highlands J virus, Fort Morgan virus, Ndumu virus, Buggy Creek virus, 
and any other virus classified by the International Committee on Taxonomy of 
Viruses (ICTV) as an alphavirus. The preferred alphaviruses for use in the present 
invention include Sindbis vims strains (e.g. , TR339), Girdwood S.A., S.A.AR86, 
and Ockelbo82. 

An "Old World alphavirus" is a virus that is primarily distributed 
throughout the Old World. Alternately stated, an Old World alphavirus is a virus 
that is primarily distributed throughout Africa, Asia, Australia and New Zealand, 
or Europe. Exemplary Old World viruses include SF group alphaviruses and SIN 
group alphaviruses. SF group alphaviruses include Semliki Forest virus, 
Middelburg virus, Chikungunya virus, O'Nyong-Nyong virus, Ross River virus, 
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Barmah Forest virus, Getah virus, Sagiyama virus, Bebaru virus, Mayaro virus, 
and Una virus. SIN group aiphaviruses include Sindbis virus, South African 
Arbovirus No. 86,. Ockelbo virus, Girdwood S.A. virus, Aura virus, Whataroa 
virus, Babanki virus, and Kyzylagach virus. 

Acceptable aiphaviruses include those containing attenuating 
mutations. The phrases "attenuating mutation" and "attenuating amino acid," as 
used herein, mean a nucleotide sequence containing a mutation, or an amin o acid 
encoded by a nucleotide sequence containing a mutation, which mutation results 
in a decreased probability of causing disease in its host (i.e., a loss of virulence), 
in accordance with standard terminology in the art, whether the mutation be a 
substitution mutation or an in-frame deletion mutation. See, e.g., B. DAVIS ET 
AL., MICROBIOLOGY 132 (3d ed. 1980). The phrase "attenuating mutation" 
excludes mutations or combinations of mutations which would be lethal to the 
virus. 

Appropriate attenuating mutations will be dependent upon the 
alphavirus used. Suitable attenuating mutations within the alphavirus genome will 
be known to those skilled in the art. Exemplary attenuating mutations include, but 
are not limited to, those described in United States Patent No. 5,505,947 to 
Johnston et al. f copending United States application 08/448,630 to Johnston et al. , 
and copending United States application 08/446,932 to Johnston et al. It is 
intended that all United States patent references be incorporated in their entirety 
by reference. 

Attenuating mutations may be introduced into the RNA by 
performing site-directed mutagenesis on the cDNA which encodes the RNA, in 
accordance with known procedures. See, Kunkel, Proc. Natl. Acad. Sci. USA 82, 
488 (1985), the disclosure of which is incorporated herein by reference in its 
entirety. Alternatively, mutations may be introduced into the RNA by replacement 
of homologous restriction fragments in the cDNA which encodes for the RNA, in 
accordance with known procedures. 
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I. Methods for Introducing and Expressing Hrt erolopons T?MA 
Marrow Cells . 

The present invention provides methods of using a recombinant 
alphavirus to introduce and express a heterologous RNA in bone marrow cells. 
5 Such methods are useful as vaccination strategies when the heterologous RNA 
encodes an immunogenic protein or peptide. Alternatively, such methods are 
useful in introducing and expressing in bone marrow cells an RNA which encodes 
a desirable protein or peptide, for example, a therapeutic protein or peptide. 

The present invention is carried out using a recombinant alphavirus 
10 to introduce a heterologous RNA into bone marrow cells. Any alphavirus that 
targets and infects bone marrow cells is suitable. Preferred alphaviruses include 
Old World alphaviruses, more preferably SF group alphaviruses and SIN group 
alphaviruses, more preferably Sindbis virus strains (e.g., TR339), S.A.AR86 
virus, Girdwood S.A. virus, and Ockelbo virus. In a more preferred embodiment, 
15 the alphavirus contains one or more attenuating mutations, as described 
hereinabove. 

Two types of recombinant virus vector are contemplated in carrying 
out the present invention. In one embodiment employing "double promoter 
vectors," the heterologous RNA is inserted into a replication and propagation 

20 competent virus. Double promoter vectors are described in United States Patent 
No. 5,505,947 to Johnston et al. With this type of viral vector, it is preferable 
that heterologous RNA sequences of less than 3 kilobases are inserted into the viral 
vector, more preferably those less than 2 kilobases, and more preferably still those 
less than 1 kilobase. In an alternate embodiment, propagation-defective "replicon 

25 vectors," as described in copending United States application 08/448,630 to 
Johnston et al., will be used. One advantage of replicon viral vectors is that larger 
RNA inserts, up to approximately 4-5 kilobases in length can be utilized. Double 
promoter vectors and replicon vectors are described in more detail hereinbelow. 
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The recombinant alphaviruses of the claimed method target the 
heterologous RNA to bone marrow cells, where it expresses the encoded protein or 
peptide. Heterologous RNA can be introduced and expressed in any cell type found in the 
bone marrow. Bone marrow cells that may be targeted by the recombinant alphaviruses of 
5 the present invention include, but are not limited to, polymorphonuclear cells, hemopoietic 
stem cells (including megakaryocyte colony forming units (CFU-M), spleen colony 
forming units (CFU-S), erythroid colony forming units (CFU-E), erythroid burst forming 
units (BFU-E), and colony forming units in culture (CFU-C), erythrocytes, macrophages 
(including reticular cells), monocytes, granulocytes, megakaryoctyes, lymphocytes, 
10 fibroblasts, osteoprogenitor cells, osteoblasts, osteoclasts, marrow stromal cells, 
chondrocytes and other cells of synovial joints. Preferably, marrow cells within the 
endosteum are targeted, more preferably osteoblasts. Also preferred are methods in which 
cells in the endosteum of synovial joints (e.g. , hip and knee joints) are targeted. 

By targeting to the cells of the bone marrow, it is meant that the primary 
15 site in which the virus will be localized in vivo is the cells of the bone marrow. 
Alternately stated, the alphaviruses of the present invention target bone marrow cells, such 
that titers in bone marrow two days after infection are greater than 100 PFU/g crushed 
bone, preferably greater than 200 PFU/g crushed bone, more preferably greater than 300 
PFU/g crushed bone, and more preferably still greater than 500 PFU/g crushed bone. 
20 Virus may be detected occasionally in other cell or tissue types, but only sporadically and 
usually at low levels. Virus localization in the bone marrow can be demonstrated by any 
suitable technique known in the art, such as in situ hybridization. 

Bone marrow cells are long-lived and harbor infectious alphaviruses for a 
prolonged period of time, as demonstrated in the Examples below. These characteristics 
25 of bone marrow cells render the present invention useful not only for the purpose of 
supplying a desired protein or peptide to skeletal tissue, but also for expressing proteins or 
peptides in vivo that are needed by other cell or tissue types. 

The present invention can be carried out in vivo or with cultured bone 
marrow cells in vitro. Bone marrow cell cultures include primary cultures 
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of bone marrow cells, serially-passaged cultures of bone marrow cells, and 
cultures of immortalized bone marrow cell lines. Bone marrow cells may be 
cultured by any suitable means known in the art. 

The recombinant alphaviruses of the present invention carry a 
heterologous RNA segment. The heterologous RNA segment encodes a promoter 
and an inserted heterologous RNA. The inserted heterologous RNA may encode 
any protein or a peptide which is desirably expressed by the host bone marrow 
cells. Suitable heterologous RNA may be of prokaryotic (e.g. , RNA encoding the 
Botulinus toxin C), or eukaryotic (e.g., RNA encoding malaria Plasmodium 
protein csl) origin. Illustrative proteins and peptides encoded by the heterologous 
RNAs of the present invention include hormones, growth factors, interleukins, 
cytokines, chemokines, enzymes, and ribozymes. Alternately, the heterologous 
RNAs encode any therapeutic protein or peptide. As a further alternative, the 
heterologous RNAs of the present invention encode any immunogenic protein or 
peptide. 

An immunogenic protein or peptide, or "immunogen," may be any 
protein or peptide suitable for protecting the subject against a disease, including 
but not limited to microbial, bacterial, protozoal, parasitic, and viral diseases. For 
example, the immunogen may be an orthomyxovirus immunogen (e.g., an 
influenza virus immunogen, such as the influenza virus hemagglutinin (HA) 
surface protein or the influenza virus nucleoprotein gene, or an equine influenza 
virus immunogen), or a lentivirus immunogen (e.g., an equine infectious anemia 
virus immunogen, a Simian Immunodeficiency Virus (SIV) immunogen, or a 
Human Immunodeficiency Virus (HIV) immunogen, such as the HIV envelope 
GP160 protein and the HTV matrix/capsid proteins). The immunogen may also be 
an arenavirus immunogen (e.g., Lassa fever virus immunogen, such as the Lassa 
fever virus nucleocapsid protein gene and the Lassa fever envelope glycoprotein 
gene), a poxvirus immunogen (e.g., vaccinia), a flavivirus immunogen (e.g., a 
yellow fever virus immunogen or a Japanese encephalitis virus immunogen), a 
filovirus immunogen (e.g., an Ebola virus immunogen, or a Marburg virus 
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immunogen), a bunyavirus immunogen (e.g., RVFV, CCHF, and SFS viruses), 
or a coronavirus immunogen (e.g. , an infectious human coronavirus immunogen, 
such as the human coronavirus envelope glycoprotein gene, or a transmissible 
gastroenteritis virus immunogen for pigs, or an infectious bronchitis virus 
5 immunogen for chickens). 



Alternatively, the present invention can be used to express 
heterologous RNAs encoding antisense oligonucleotides. In general, "antisense" 
refers to the use of small, synthetic oligonucleotides to inhibit gene expression by 
inhibiting the function of the target mRNA containing the complementary 

10 sequence. Milligan, J.F. et al., /. Med. Chem. 36(14), 1923-1937 (1993). Gene 
expression is inhibited through hybridization to coding (sense) sequences in a 
specific mRNA target by hydrogen bonding according to Watson-Crick base 
pairing rules. The mechanism of antisense inhibition is that the exogenously 
applied oligonucleotides decrease the mRNA and protein levels of the target gene. 

15 Milligan, J.F. et al. , 7. Med. Chem. 36(14), 1923-1937 (1993). See also Helene, 
C. and Toulme, J., Biochim. Biophys. Acta 1049, 99-125 (1990); Cohen, J.S., 
Ed., OUGODEOXYNUCLEOTIDES AS ANTISENSE INHIBITORS OF GENE 
EXPRESSION, CRC Press:Boca Raton, FL (1987). 



Antisense oligonucleotides may be of any suitable length, depending 
on the particular target being bound. The only limits on the length of the antisense 
oligonucleotide is the- capacity of the virus for inserted heterologous RNA. 
Antisense oligonucleotides may be complementary to the entire mRNA transcript 
of the target gene or only a portion thereof. Preferably the antisense 
oligonucleotide is directed to an mRNA region containing a junction between 
intron and exon. Where the antisense oligonucleotide is directed to an intron/exon 
junction, it may either entirely overlie the junction or may be sufficiently close to 
the junction to inhibit splicing out of the intervening exon during processing of 
precursor mRNA to mature mRNA (e.g., with the 3* or 5" terminus of the 
antisense oligonucleotide being positioned within about, for example, 10, 5, 3 or 
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2 nucleotides of the intron/exon junction). Also preferred are antisense 
oligonucleotides which overlap the initiation codon. 



When practicing the present invention, the antisense oligonucleotides 
administered may be related in origin to the species to which it is a dminis tered 
5 When treating humans, human antisense may be used if desired. 

Promoters for use in carrying out the present invention are operable 
in bone marrow cells. An operable promoter in bone marrow cells is a promoter 
that is recognized by and functions in bone marrow cells. Promoters for use with 
the present invention must also be operatively associated with the heterologous 

10 RNA to be expressed in the bone marrow. A promoter is operably linked to a 
heterologous RNA if it controls the transcription of the heterologous RNA, where 
the heterologous RNA comprises a coding sequence. Suitable promoters are well 
known in the art. The Sindbis 26S promoter is preferred when the alphavirus is 
a strain of Sindbis virus. Additional preferred promoters beyond the Sindbis 26S 

15 promoter include the Girdwood S.A. 26S promoter when the alphavirus is 
Girdwood S.A., the S.A.AR86 26S promoter when the alphavirus is S.A.AR86, 
and any other promoter sequence recognized by alphavirus polymerases. 
Alphavirus promoter sequences containing mutations which alter the activity level 
of the promoter (in relation to the activity level of the wild-type) are also suitable 

20 in the practice of the present invention. Such mutant promoter sequences are 
described in Raju and Huang, /. Virol 65, 2501-2510 (1991), the disclosure of 
which is incorporated in its entirety by reference. 

The heterologous RNA is introduced into the bone marrow cells by 
contacting the recombinant alphavirus carrying the heterologous RNA segment to 
25 the bone marrow cells. By contacting, it is meant bringing the recombinant 
alphavirus and the bone marrow cells in physical proximity. The contacting step 
can be performed in vitro or in vivo. In vitro contacting can be carried out with 
cultures of immortalized or non-immortalized bone marrow cells. In one particular 
embodiment, bone marrow cells can be removed from a subject, cultured in vitro, 
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infected with the vector, and then introduced back into the subject. Contacting is 
performed in vivo when the recombinant alphavirus is adniinistered to a subject. 
Pharmaceutical formulations of recombinant alphavirus can be a dminis tered to a 
subjectparenterally (e.g., subcutaneous, intracerebral, intradermal, intramuscular, 
intravenous and intraarticular) administration. Alternatively, pharmaceutical 
formulations of the present invention may be suitable for administration to the 
mucus membranes of a subject (e.g., intranasal administration, by use of a 
dropper, swab, or inhaler). Methods of preparing infectious virus particles and 
pharmaceutical formulations thereof are discussed in more detail hereinbelow. 

By "introducing" the heterologous RNA segment into the bone 
marrow cells it is meant infecting the bone marrow cells with recombinant 
alphavirus containing the heterologous RNA, such that the viral vector carrying the 
heterologous RNA enters the bone marrow cells and can be expressed therein. As 
used with respect to the present invention, when the heterologous RNA is 
"expressed," it is meant that the heterologous RNA is transcribed. In particular 
embodiments of the invention in which it is desired to produce a protein or 
peptide, expression further includes the steps of post-transcriptional processing and 
translation of the mRNA transcribed from the heterologous RNA. In contrast, 
where the heterologous RNA encodes an antisense oligonucleotide, expression need 
not include post-transcriptional processing and translation. With respect to 
embodiments in which the heterologous RNA encodes an immunogenic protein or 
a protein being administered for therapeutic purposes, expression may also include 
the further step of post-translational processing to produce an immunogenic or 
therapeutically-active protein. 

The present invention also provides infectious RNAs, as described 
hereinabove, and cDNAs encoding the same. Preferably the infectious RNAs and 
cDNAs are derived from the S.A.AR86, Girdwood S.A., TR339, or Ockelbo 
viruses. The cDNA clones can be generated by any of a variety of suitable 
methods known to those skilled in the art. A preferred method is the method set 
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forth in United States Patent No. 5,185,440 to Davis et al., the disclosure of which 
is incorporated in its entirety by reference, and Gubler et al., Gene 25:263 (1983). 

RNA is preferably synthesized from the DNA sequence in vitro 
using purified RNA polymerase in the presence of ribonucleotide triphosphates and 
5 cap analogs in accordance with conventional techniques. However, the RNA may 
also be synthesized intracellularly after introduction of the cDNA. 

A. Double Promoter Vectors. 

In one embodiment of the invention, double promoter vectors are 

used to introduce the heterologous RNA into the target bone marrow cells. A 
10 double promoter virus vector is a replication and propagation competent virus. 

Double promoter vectors are described in United States Patent No. 5,505,947 to 

Johnston et al., the disclosure of which is incorporated in its entirety by reference. 

Preferred alphaviruses for constructing the double promoter vectors are S. A. AR86, 

Girdwood S.A., TR339 and Ockelbo viruses. More preferably, the double 
15 promoter vector contains one or more attenuating mutations. Attenuating 

mutations are described in more detail hereinabove. 

The double promoter vector is constructed so as to contain a second 
subgenomic promoter {i.e., 26S promoter) inserted 3' to the virus RNA encoding 
the structural proteins. The heterologous RNA is inserted between the second 

20 subgenomic promoter, so as to be operatively associated therewith, and the 3' 
UTR of the virus genome. Heterologous RNA sequences of less than 3 kilobases, 
more preferably those less than 2 kilobases, and more preferably still those less 
than 1 kilobase, can be inserted into the double promoter vector. In a preferred 
embodiment of the invention, the double promoter vector is derived from 

25 Girdwood S.A., and the second subgenomic promoter is a duplicate of the 
Girdwood S. A. subgenomic promoter. In an alternate preferred embodiment, the 
double promoter vector is derived from TR339, and the second subgenomic 
promoter is a duplicate of the TR339 subgenomic promoter. 
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B. Replicon Vectors, 

Replicon vectors, which are propagation-defective virus vectors can 
also be used to carry out the present invention. Replicon vectors are described in 
more detail in copending United States Application 08/448,630 to Johnston et al., 
the disclosure of which is incorporated in its entirety by reference. Preferred 
alphaviruses for constructing the replicon vectors are S.A.AR86, Girdwood S.A., 
TR339, and Ockelbo. 

In general, in the replicon system, a foreign gene to be expressed 
is inserted in place of at least one of the viral structural protein genes in a 
transcription plasmid containing an otherwise full-length cDNA copy of the 
alphavirus genome RNA. RNA transcribed from this plasmid contains an intact 
copy of the viral nonstructural genes which are responsible for RNA replication 
and transcription. Thus, if the transcribed RNA is transfected into susceptible 
cells, it will be replicated and translated to give the nonstructural proteins. These 
proteins will transcribe the transfected RNA to give high levels of subgenomic 
mRNA, which will then be translated to produce high levels of the foreign protein. 
The autonomously replicating RNA (Le. , replicon) can only be packaged into virus 
particles if the alphavirus structural protein genes are provided on one or more 
"helper" RNAs, which are cotransfected into cells along with the replicon RNA. 
The helper RNAs do not contain the viral nonstructural genes for replication, but 
these functions are provided in trans by the replicon RNA. Similarly, the 
transcriptase functions translated from the replicon RNA transcribe the structural 
protein genes on the helper RNA, resulting in the synthesis of viral structural 
proteins and packaging of the replicon into virus-like particles. As the packaging 
or encapsidation signal for alphavirus RNAs is located within the nonstructural 
genes, the absence of these sequences in the helper RNAs precludes their 
incorporation into virus particles. 

Alphavirus-permissive cells employed in the methods of the present 
invention are cells which, upon transfection with the viral RNA transcript, are 
capable of producing viral particles. Preferred alphavirus-permissive cells are 
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TR339-permissive cells, Girdwood S.A.-permissive cells, S.A.AR86-permissivecells, and 
Ockelbo-permissive cells. Alphavimses have a broad host range. Examples of suitable 
host cells include, but are not limited to Vero cells, baby hamster kidney (BHK) cells, and 
chicken embryo fibroblast cells. 

5 The phrase "structural protein" as used herein refers to the encoded 

proteins which are required for encapsidation (e.g., packaging) of the RNA replicon, and 
include the capsid protein, El glycoprotein, and E2 glycoprotein. As described 
hereinabove, the structural proteins of the alphavirus are distributed among one or more 
helper RNAs (i.e., a first helper RNA and a second helper RNA). In addition, one or 

10 more structural proteins may be located on the same RNA molecule as the replicon RNA, 
provided that at least one structural protein is deleted from the replicon RNA such that the 
resulting alphavirus particle is propagation defective. As used herein, the terms "deleted" 
or "deletion" mean either total deletion of the specified segment or the deletion of a 
sufficient portion of the specified segment to render the segment inoperative or 

15 nonfunctional, in accordance with standard usage. See, e.g., U.S. Patent No. 4,650,764 
to Temin et al. The term "propagation defective" as used herein, means that the replicon 
RNA cannot be encapsidated in the host cell in the absence of the helper RNA. The 
resulting alphavirus replicon particles are propagation defective inasmuch as the replicon 
RNA in these particles does not include all of the alphavirus structural proteins required 

20 for encapsidation, at least one of the required structural proteins being deleted therefrom, 
such that the replicon RNA initiates only an abortive infection; no new viral particles are 
produced, and there is no spread of the infection to other cells. 

The helper cell for expressing the infectious, propagation defective alphavirus 
particle comprises a set of RNAs, as described above. The set of RNAs principally 
25 include a first helper RNA and a second helper RNA. The first helper RNA includes 
RNA encoding at least one alphavirus structural protein but does not encode all alphavirus 
structural proteins. In other words, the first helper RNA does not encode at least one 
alphavirus structural protein; the at least one non-coded alphavirus structural protein being 
deleted from the first helper RNA. 
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In one embodiment, the first helper RNA includes RNA encoding the alphavirus 
El glycoprotein, with the alphavirus capsid protein and the alphavirus E2 
glycoprotein being deleted from the first helper RNA. In another embodiment, the 
first helper RNA includes RNA encoding the alphavirus E2 glycoprotein, with the 
alphavirus capsid protein and the alphavirus El glycoprotein being deleted from 
the first helper RNA. In a third, preferred embodiment, the first helper RNA 
includes RNA encoding the alphavirus El glycoprotein and the alphavirus E2 
glycoprotein, with the alphavirus capsid protein being deleted from the first helper 
RNA. 



10 



The second helper RNA includes RNA encoding at least one 
alphavirus structural protein which is different from the at least one structural 
protein encoded by the first helper RNA. Thus, the second helper RNA encodes 
at least one alphavirus structural protein which is not encoded by the first helper 
RNA. The second helper RNA does not encode the at least one alphavirus 
15 structural protein which is encoded by the first helper RNA, thus the first and 
second helper RNAs do not encode duplicate structural proteins. In the 
embodiment wherein the first helper RNA includes RNA encoding only the 
alphavirus El glycoprotein, the second helper RNA may include RNA encoding 
one or both of the alphavirus capsid protein and the alphavirus E2 glycoprotein 
20 which are deleted from the first helper RNA. In the embodiment wherein, the first 
helper RNA includes RNA encoding only the alphavirus E2 glycoprotein, the 
second helper RNA may include RNA encoding , one or both of the alphavirus 
capsid protein and the alphavirus El glycoprotein which are deleted from the first 
helper RNA. In the embodiment wherein the first helper RNA includes RNA 
25 encoding both the alphavirus El glycoprotein and the alphavirus E2 glycoprotein, 
the second helper RNA may include RNA encoding the alphavirus capsid protein 
which is deleted from the first helper RNA. 

In one embodiment, the packaging segment (RNA comprising the 
encapsidation or packaging signal) is deleted from at least the first helper RNA. 
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In a preferred embodiment, the packaging segment is deleted from both the first 
helper RNA and the second helper RNA. 

In the preferred embodiment wherein the packaging segment is 
deleted from both the first helper RNA and the second helper RNA, the helper ceU 
is co-transfected with a replicon RNA in addition to the first helper RNA and the 
second helper RNA. The replicon RNA encodes the packaging segment and an 
inserted heterologous RNA. The inserted heterologous RNA may be RNA 
encoding a protein or a peptide. In a preferred embodiment, the replicon RNA, 
the first helper RNA and the second helper RNA are provided on separate 
molecules such that a first molecule, i.e., the replicon RNA, includes RNA 
encoding the packaging segment and the inserted heterologous RNA, a second 
molecule, i.e., the first helper RNA, includes RNA encoding at least one but not 
all of the required alphavirus structural proteins, and a third molecule, i.e., the 
second helper RNA, includes RNA encoding at least one but not all of the required 
alphavirus structural proteins. For example, in one preferred embodiment of the 
present invention, the helper cell includes a set of RNAs which include (a) a 
replicon RNA including RNA encoding an alphavirus packaging sequence and an 
inserted heterologous RNA, (b) a first helper RNA including RNA encoding the 
alphavirus El glycoprotein and the alphavirus E2 glycoprotein, and (c) a second 
helper RNA including RNA encoding the alphavirus capsid protein so that the 
alphavirus El glycoprotein, the alphavirus E2 glycoprotein and the capsid protein 
assemble together into alphavirus particles in the host cell. 

In an alternate embodiment, the replicon RNA and the first helper 
RNA are on separate molecules, and the replicon RNA and RNA encoding a 
structural gene not encoded by the first helper RNA are on another single molecule 
together, such that a first molecule, i.e., the first helper RNA, including RNA 
encoding at least one but not all of the required alphavirus structural proteins, and 
a second molecule, i.e. , the replicon RNA, including RNA encoding the packaging 
segment, the inserted heterologous RNA, and the remaining structural proteins not 
encoded by the first helper RNA. For example, in one preferred embodiment of 
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the present invention, the helper cell includes a set of RNAs including (a) a 
replicon RNA including RNA encoding an alphavirus packaging sequence, an 
inserted heterologous RNA, and an alphavirus capsid protein, and (b) a first helper 
RNA including RNA encoding the alphavirus El glycoprotein and the alphavirus 
E2 glycoprotein so that the alphavirus El glycoprotein, the alphavirus E2 
glycoprotein and the capsid protein assemble together into alphavirus particles in 
the host cell, with the replicon RNA packaged therein. 

In one preferred embodiment of the present invention, the RNA 
encoding the alphavirus structural proteins, i.e., the capsid, El glycoprotein and 
E2 glycoprotein, contains at least one attenuating mutation, as described 
hereinabove. Thus, according to this embodiment, at least one of the first helper 
RNA and the second helper RNA includes at least one attenuating mutation. In 
a more preferred embodiment, at least one of the first helper RNA and the second 
helper RNA includes at least two, or multiple, attenuating mutations. The multiple 
15 attenuating mutations may be positioned in either the first helper RNA or in the 
second helper RNA, or they may be distributed randomly with one or more 
attenuating mutations being positioned in the first helper RNA and one or more 
attenuating mutations positioned in the second helper RNA. Alternatively, when 
the replicon RNA and the RNA encoding the structural proteins not encoded by 
20 the first helper RNA are located on the same molecule, an attenuating mutation 
may be positioned in the RNA which codes for the structural protein not encoded 
by the first helper RNA. The attenuating mutations may also be located within the 
RNA encoding non-structural proteins {e.g., the replicon RNA). 

Preferably, the first helper RNA and the second helper RNA also 
25 include a promoter. It is also preferred that the replicon RNA also includes a 
promoter. Suitable promoters for inclusion in the first helper RNA, second helper 
RNA and replicon RNA are well known in the art. One preferred promoter is the 
Girdwood S.A. 26S promoter for use when the alphavirus is Girdwood S.A. 
Another preferred promoter is the TR339 26S promoter for use when the 
30 alphavirus is TR339. Additional promoters beyond the Girdwood S.A. and TR339 
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promoters include the VEE 26S promoter, the Sindbis 26S promoter, the Semliki 
Forest 26S promoter, and any other promoter sequence recognized by alphavirus 
polymerases. Alphavirus promoter sequences containing mutations which alter the 
activity level of the promoter (in relation to the activity level of the wild-type) are 
also suitable in the practice of the present invention. Such mutant promoter 
sequences are described in Raju and Huang, /. Virol. 65, 2501-2510 (1991), the 
disclosure of which is incorporated herein in its entirety. In the system wherein 
the first helper RNA, the second helper RNA, and the replicon RNA are all on 
separate molecules, the promoters, if the same promoter is used for all three 
RNAs, provide a homologous sequence between the three molecules. It is 
preferred that the selected promoter is operative with the non-structural proteins 
encoded by the replicon RNA molecule. 

In cases where vaccination with two immunogens provides improved 
protection against disease as compared to vaccination with only a single 
immunogen, a double-promoter replicon would ensure that both immunogens are 
produced in the same cell. Such a replicon would be the same as the one 
described above, except that it would contain two copies of the 26S RNA 
promoter, each followed by a different multiple cloning site, to allow for the 
insertion and expression of two different heterologous proteins. Another useful 
strategy is to insert the IRES sequence from the picornavirus, EMC virus, between 
the two heterologous genes downstream from the single 26S promoter of the 
replicon described above, thus leading to expression of two immunogens from the 
single replicon transcript in the same cell. 

C. Uses of the Present Invention. 

The alphavirus vectors, RNAs, cDNAs, helper cells, infectious virus 
particles, and methods of the present invention find use in in vitro expression 
systems, wherein the inserted heterologous RNA encodes a protein or peptide 
which is desirably produced in vitro. The RNAs, cDNAs, helper cells, infectious 
virus particles, methods, and pharmaceutical formulations of the present invention 
are additionally useful in a method of administering a protein or peptide to a 
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subject in need of the protein or peptide, as a method of treatment or otherwise. 
In this embodiment of the invention, the heterologous RNA encodes the desired 
protein or peptide, and pharmaceutical formulations of the present invention are 
a dmin istered to a subject in need of the desired protein or peptide. In this manner, 
5 the protein or peptide may thus be produced in vivo in the subject. The subject 
may be in need of the protein or peptide because the subject has a deficiency 
thereof, or because the production of the protein or peptide in the subject may 
impart some therapeutic effect, as a method of treatment or otherwise. 



Alternately, the claimed methods provide a vaccination strategy, 
10 wherein the heterologous RNA encodes an immunogenic protein or peptide. 

The methods and products of the invention are also useful as 
antigens and for evoking the production of antibodies in animals such as horses 
and rabbits, from which the antibodies may be collected and then used in 
diagnostic assays in accordance with known techniques, 

15 A further aspect of the present invention is a method of introducing 

and expressing antisense oligonucleotides in bone marrow cell cultures to regulate 
gene expression. Alternately, the claimed method finds use in introducing and 
expressing a protein or peptide in bone marrow cell cultures. 

II. Girdwbod S.A. and TR339 Clones. 
20 Disclosed hereinbelow are genomic RNA sequences encoding live 

Girdwood S.A. virus, live S.A.AR86 virus, and live Sindbis strain TR339 virus, 
cDNAs derived therefrom, infectious RNA transcripts encoded by the cDNAs, 
infectious viral particles containing the infectious RNA transcripts, and 
pharmaceutical formulations derived therefrom. 

25 The cDNA sequence of Girdwood S.A. is given herein as SEQ ID 

NO:4. Alternatively, the cDNA may have a sequence which differs from the 
cDNA of SEQ ID NO:4, but which has the same protein sequence as the cDNA 



SUBSTITUTE SHEET (RULE 26) 



WO 98/36779 



given herein as SEQ ID NO:4. 
mutations. 



PCT/US98/02945 

-26- 

Thus, the cDNA may include one or more silent 



The phrase "silent mutation" as used herein refers to mutations in 
the cDNA coding sequence which do not produce mutations in the corresponding 
protein sequence translated therefrom. 

Likewise, the cDNA sequence of TR339 is given herein as SEQ ID 
NO:8. Alternatively, the cDNA may have a sequence which differs from the 
cDNA of SEQ ID NO:8, but which has the same protein sequence as the cDNA 
given herein as SEQ ID NO:8. Thus, the cDNA may include one or more silent 
mutations. 

The cDNAs encoding infectious Girdwood S.A. and TR339 virus 
RNA transcripts of the present invention include those homologous to, and having 
essentially the same biological properties as, the cDNA sequences disclosed herein 
as SEQ ID NO:4 and SEQ ID NO:8, respectively. Thus, cDNAs that hybridize 
to cDNAs encoding infectious Girdwood S.A. or TR339 virus RNA transcripts 
disclosed herein are also an aspect of this invention. Conditions which will permit 
other cDNAs encoding infectious Girdwood S.A. or TR339 virus transcripts to 
hybridize to the cDNAs disclosed herein can be determined in accordance with 
lmown techniques. For example, hybridization of such sequences may be carried 
out under conditions of reduced stringency, medium stringency, or even high 
stringency conditions {e.g. , conditions represented by a wash stringency of 35-40% 
formamide with 5X Denhardt's solution, 0.5% SDS and IX SSPE at 37°C; 
conditions represented by a wash stringency of 40-45% formamide with 5X 
Denhardt's solution, 0.5% SDS, and IX SSPE at 42°C; and conditions represented 
by a wash stringency of 50% formamide with 5X Denhardt's solution, 0.5% SDS 
and IX SSPE at 42°C, respectively, to cDNA encoding infectious Girdwood S.A. 
or TR339 virus RNA transcripts disclosed herein in a standard hybridization assay. 
See J. SAMBROOK ET AL., MOLECULAR CLONING: A LABORATORY 
MANUAL (2d ed. 1989)). In general, cDNA sequences encoding infectious 
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Girdwood S.A. or TR339 virus RNA transcripts that hybridize to the cDNAs 
disclosed herein will be at least 30% homologous, 50% homologous, 75% 
homologous, and even 95% homologous or more with the cDNA sequences 
encoding infectious Girdwood S.A. or TR339 virus RNA transcripts disclosed 
herein. 



Promoter sequences and Girdwood S.A. virus or Sindbis virus strain 
TR339 cDNA clones are operatively associated in the present invention such that 
the promoter causes the cDNA clone to be. transcribed in the presence of an RNA 
polymerase which binds to the promoter. The promoter is positioned on the 5' end 
(with respect to the virion RNA sequence), of the cDNA clone. An excessive 
number of nucleotides between the promoter sequence and the cDNA clone will 
result in the inoperability of the construct. Hence, the number of nucleotides 
between the promoter sequence and the cDNA clone is preferably not more than 
eight, more preferably not more than five, still more preferably not more than 
three, and most preferably not more than one. 

Examples of promoters which are useful in the cDNA sequences of 
the present invention include, but are not limited to T3 promoters, T7 promoters, 
cytomegalovirus (CMV) promoters, and SP6 promoters. The DNA sequence of 
the present invention may reside in any suitable transcription vector. The DNA 
sequence preferably has a complementary DNA sequence bound thereto so that the 
double-stranded sequence will serve as an active template for RNA polymerase. 
The transcription vector preferably comprises a plasmid. When the DNA sequence 
comprises a plasmid, it is preferred that a unique restriction site be provided 3' 
(with respect to the virion RNA sequence) to the cDNA clone. This provides a 
means for linearizing the DNA sequence to allow the transcription of genome- 
length RNA in vitro. 



The cDNA clones can be generated by any of a variety of suitable 
methods known to those skilled in the art. A preferred method is the method set 
forth in United States Patent No. 5,185,440 to Davis et al., the disclosure of which 
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is incorporated in its entirety by reference, and Gubler et al. , Gene 25:263 (1983). 

RNA is preferably synthesized from the DNA sequence in vitro 
using purified RNA polymerase in the presence of ribonucleotide triphosphates and 
cap analogs in accordance with conventional techniques. However, the RNA may 
also be synthesized intracellularly after introduction of the cDNA. 



The Girdwood S.A. and TR339 cDNA clones and the infectious 
RNAs and infectious virus particles produced therefrom of the present invention 
are useful for the preparation of pharmaceutical formulations, such as vaccines. 
In addition, the cDNA clones, infectious RNAs, and infectious viral particles of 

10 the present invention are useful for administration to animals for the purpose of 
producing antibodies to the Girdwood S.A. virus or the Sindbis virus strain 
TR339, which antibodies may be collected and used in known diagnostic 
techniques for the detection of Girdwood S.A. virus or Sindbis virus strain TR339. 
Antibodies can also be generated to the viral proteins expressed from the cDNAs 

15 disclosed herein. As another aspect of the present invention, the claimed cDNA 
clones are useful as nucleotide probes to detect the presence of Girdwood S.A. or 
TR339 genomic RNA or transcripts. 



20 



m. Infectious Virus Particles a, n d Pharmaceutical Formulating 

The infectious virus particles of the present invention include those 
containing double promoter vectors and those containing replicon vectors as 
described hereinabove. Alternately, the infectious virus particles contain infectious 
RNAs encoding the Girdwood S.A. or TR339 genome. When the infectious RNA 
comprises the Girdwood S.A. genome, preferably the RNA has the sequence 
encoded by the cDNA given as SEQ ID NO:4. When the infectious RNA 
25 comprises the TR339 genome, preferably the RNA has the sequence encoded by 
the cDNA given as SEQ ID NO:8. 

The infectious, alphavirus particles of the present invention may be 
prepared according to the methods disclosed herein in combination with techniques 
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known to those skilled in the art. These methods include transfecting an 
alphavirus-permissive cell with a replicon RNA including the alphavirus packaging 
segment and an inserted heterologous RNA, a first helper RNA including RNA 
encoding at least one alphavirus structural protein, and a second helper RNA 
including RNA encoding at least one alphavirus structural protein which is 
different from that encoded by the first helper RNA. Alternately, and preferably, 
at least one of the helper RNAs is produced from a cDNA encoding the helper 
RNA and operably associated with an appropriate promoter, the cDNA being 
stably transfected and integrated into the cells. More preferably, all of the helper 
RNAs will be "launched" from stably transfected cDNAs. The step of transfecting 
the alphavirus-permissive cell can be carried out according to any suitable means 
known to those skilled in the art, as described above with respect to propagation- 
competent viruses. 



Uptake of propagation-competent RNA into the cells in vitro can be 
carried out according to any suitable means known to those skilled in the art. 
Uptake of RNA into the cells can be achieved, for example, by treating the cells 
with DEAE-dextran, treating the RNA with LIPOFECTIN® before addition to the 
cells, or by electroporation, with electroporation being the currently preferred 
means. These techniques are well known in the art. See e.g., United States 
Patent No. 5,185,440 to Davis et al. f and PCT Publication No. WO 92/10578 to 
Bioption AB, the disclosures of which are incorporated herein by reference in their 
entirety. Uptake of propagation-competent RNA into the cell in vivo can be 
carried out by administering the infectious RNA to a subject as described in 
Section I above. 



The infectious RNAs may also contain a heterologous RNA 
segment, where the heterologous RNA segment contains a heterologous RNA and 
a promoter operably associated therewith. It is preferred that the infectious RNA 
introduces and expresses the heterologous RNA in bone marrow cells as described 
in Section I above. According to this embodiment, it is preferable that the 
promoter operatively associated with the heterologous RNA is operable in bone 
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marrow cells. The heterologous RNA may encode any protein or peptide, 
preferably an immunogenic protein or peptide, a therapeutic protein or peptide, a 
hormone, a growth factor, an interleukin, a cytokine, a chemokine, an enzyme, 
a ribozyme, or an antisense oligonucleotide as described in more detail in Section 
5 I above. 



The step of facilitating the production of the infectious viral particles 
in the cells may be carried out using conventional techniques. See e.g., United 
States Patent No. 5,185,440 to Davis et al., PCT Publication No. WO 92/10578 
to Bioption AB, and United States Patent No. 4,650,764 to Temin et al. (although 
10 Temin et al. , relates to retroviruses rather than alphaviruses). . The infectious viral 
particles may be produced by standard cell culture growth techniques. 

The step of collecting the infectious virus particles may also be 
carried out using conventional techniques. For example, the infectious particles 
may be collected by cell lysis, or collection of the supernatant of the cell culture, 

15 as is known in the art. See e.g., United States Patent No. 5,185,440 to Davis et 
al., PCT Publication No. WO 92/10578 to Bioption AB, and United States Patent 
No. 4,650,764 to Temin et al. Other suitable techniques will be known to those 
skilled in the art. Optionally, the collected infectious virus particles may be 
purified if desired. Suitable purification techniques are well known to those skilled 

20 in the art. 

Pharmaceutical formulations, such as vaccines, of the present 
invention comprise an immunogenic amount of the infectious, virus particles in 
combination with a pharmaceutical^ acceptable carrier. An "immunogenic 
amount" is an amount of the infectious virus particles which is sufficient to evoke 
25 an immune response in the subject to which the pharmaceutical formulation is 
administered. An amount of from about 10 3 to about 10 7 particles, and preferably 
about 10 4 to 10 6 particles per dose is believed suitable, depending upon the age and 
species of the subject being treated, and the immunogen against which the immune 
response is desired. 
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Pharmaceutical formulations of the present invention for therapeutic 
use comprise a therapeutic amount of the infectious virus particles in combination 
with a pharmaceutical^ acceptable carrier. A "therapeutic amount" is an amount 
of the infectious virus particles which is sufficient to produce a therapeutic effect 
5 (e.g. , triggering an immune response or supplying a protein to a subject in need 
thereof) in the subject to which the pharmaceutical formulation is administered. 
The therapeutic amount will depend upon the age and species of the subject being 
treated, and the therapeutic protein or peptide being aciministered. Typical dosages 
are an amount from about 10' to about 10 s infectious units. 

10 Exemplary pharmaceutical^ acceptable carriers include, but are not 

limited to, sterile pyrogen-free water and sterile pyrogen-free physiological saline 
solution. Subjects which may be administered immunogenic amounts of the 
infectious virus particles of the present invention include but are not limited to 
human and animal (e.g., pig, cattle, dog, horse, donkey, mouse, hamster, 

15 monkeys) subjects. 

Pharmaceutical formulations of the present invention include those 
suitable for parenteral (e.g., subcutaneous, intracerebral, intradermal, 
intramuscular, intravenous and intraarticular) administration. Alternatively, 
pharmaceutical formulations of the present invention may be suitable for 
20 .administration to the mucus membranes of a subject (e. g. , intranasal aoministration 
by use of a dropper, swab, or inhaler). The formulations may be conveniently 
prepared in unit dosage form and may be prepared by any of the methods well 
known in the art. 

The following examples are provided to illustrate the present 
25 invention, and should not be construed as limiting thereof. In these examples, 
PBS means phosphate buffered saline, EDTA means ethylene diamine tetraacetate, 
ml means milliliter, fil means microliter, mM means millimolar, pM means 
micromolar, u means unit, PFU means plaque forming units, g means gram, mg 
means milligram, /tg means microgram, cpm means counts per minute, ic means 
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intracerebral or intracerebral^, ip means intraperitoneal or intraperitoneally, iv 
means intravenous or intravenously, and sc means subcutaneous or subcutaneously. 

Amino acid sequences disclosed herein are presented in the amino 
to carboxyl direction, from left to right. The amino and carboxyl groups are not 
presented in the sequence. Nucleotide sequences are presented herein by single 
strand only in the 5' to 3' direction, from left to right. Nucleotides and amino 
acids are represented herein in the manner recommended by the IUPAC-IUB 
Biochemical Nomenclature Commission, or (for amino acids) by either one letter 
or three letter code, in accordance with 37 CFR § 1.822 and established usage. 
Where one letter amino acid code is used, the same sequence is also presented 
elsewhere in three letter code. 

EXAMPLE I 
Cells and Virus Stocks 
S. A. AR86 was isolated in 1954 from a pool of Culex sp. mosquitoes 
collected near Johannesburg, South Africa. Weinbren et al., S. Afr. Med. J. 30, 
631-36 (1956). Ockelbo82 was isolated from Culiseta sp. mosquitoes collected in 
Edsbyn, Sweden in 1982 and was associated serologically with human disease. 
Niklasson et al., Am. J. Trop. Med. Hyg. 33, 1212-17 (1984). Girdwood S.A. 
was isolated from a human patient in the Johannesburg area of South.Africa in 
1963. Malherbe et al., S. Afr. Med. J. 37, 547-52 (1963). Molecularly cloned 
virus TR339 represents the deduced consensus sequence of Sindbis AR339. 
McKnight et al.., /. Virol. 70, 1981-89 (1996); WDliam Klimstra, personal 
communication. TRSB is a laboratory strain of Sindbis isolate AR339 derived 
from a cDNA clone pTRSB and differing from the AR339 consensus sequence at 
three codons. McKnight et al., /. Virol. 70, .1981-89 (1996). pTR5000 is a full- 
length cDNA clone of Sindbis AR339 following the SP6 phage promoter and 
containing mosdy Sindbis AR339 sequences. 

Stocks of all molecularly cloned viruses were prepared by 
electroporating genome length in vitro transcripts of their respective cDNA clones 
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in BHK-21 cells. Heidner et al., /. Virol 68, 2683-92 (1994). Girdwood S.A. 
(Malherbe et al., S. Afr. Med. J. 37, 547-52 (1963)) and Ockelbo82 (Espmark and 
Niklasson, Am. J. Trop. Med. Hyg. 33, 1203-11 (1984); Niklasson et al., Am. J. 
Trop. Med. Hyg. 33, 1212-17 (1984)) were passed one to three times in BHK-21 
5 cells in order to produce amplified stocks of vims. All virus stocks were 
stored at -70 °C until needed. The titers of the virus stocks were determined on 
BHK-21 cells from aliquots of frozen virus. 

EXAMPLE 2 

Cloning the S.A.AR86 and Girdwood S.A. Genomic Seg ugncgs 
10 The sequences of S.A.AR86 (Figure 1, SEQ ID NO: 1) and 

Girdwood S.A. (Figure 3, SEQ ID NO:4) were determined from uncloned reverse 
transcriptase-polymerase chain reaction (RT-PCR) fragments amplified from virion 
RNA. Heidner et ah, /. Virol. 68, 2683-92 (1994). The sequence of the 5' 40 
nucleotides was determined by directly sequencing the genomic RNA. Sanger et 
15 ah, Proc. Natl. Acad. Sci. USA 74, 5463-67 (1977); Zimmern and Kaesberg, 
Proc. Natl. Acad. Sci. USA 75, 4257-61 (1978); Ahlquist et al., Cell 23, 183-89 
(1981). 

The S. A. AR86 genome was 1 1 ,663 nucleotides in length, excluding 
the 5' CAP and 3'poly(A) tail, 40 nucleotides shorter than the alphavirus prototype 

20 Sindbis strain AR339. Strauss et al., Virology 133, 92-110 (1984). Compared 
with the consensus sequence of Sindbis virus AR339 (McKnight et al., /. Virol. 
70 1981-89 (1996)), S.A.AR86 contained two separate 6-nucleotide insertions, and 
one 3-nucleotide insertion in the 3 ' half of the nsP3 gene, a region not well 
conserved among alphaviruses. The two 6-nucleotide insertions were found 

25 immediately 3' of nucleotides 5403 and 5450, and the 3-nucleotide insertion was 
immediately 3' of nucleotide 5546 compared with the AR339 genome. In addition, 
S.A.AR86 contained a 54-nucleotide deletion in nsP3 which spanned nucleotides 
5256 to 5311 of AR339. As a result of these deletions and insertions, S.A.AR86 
nsP3 was 13 amino acids smaller than AR339, containing an 18-amino acid 

30 deletion and a total of 5 amino acids inserted. The 3' untranslated region of 
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S.A.AR86 contained, with respect to AR339, two 1-nucleotide deletions at 
nucleotides 11,513 and 11,602, and one 1-nucleotide insertion following nucleotide 
11,664. The total numbers of nucleotides and predicted amino acids comprising 
the re m a inin g genes of S.A.AR86 were identical to those of AR339. 

A notable feature of the deduced amino acid sequence of S.A.AR86 
(Figure 2, SEQ ID NO:2 and SEQ ID NO:3) was the cysteine codon in place of 
an opal termination codon between nsP3 and nsP4. S.A.AR86 is the only 
alphavirus of the Sindbis group, and one of just three alphavirus isolates sequenced 
to date, which do not contain an opal termination codon between nsP3 and nsP4. 
Takkinen, K., Nucleic Acids Res. 14, 5667-5682 (1986); Strauss et aL, Virology 
164, 265-74 (1988). 

The genome of Girdwood S.A. was 11,717 nucleotides long 
excluding the 5' CAP and 3' poly(A) tail. The nucleotide sequence (SEQ ID 
NO:4) of the Girdwood S.A, genome and the putative amino acid sequence (SEQ 
U> NO:5 and SEQ ID NO:6) of the Girdwood S.A. gene products are shown in 
Figure 3 and Figure 4, respectively. The asterisk at position 1902 in SEQ ID 
NO:5 indicates the position of the opal termination codon in the coding region of 
the nonstructural polyprotein. The extra nucleotides relative to AR339 were in the 
nonconserved half of nsP3, which contained insertions totalling 15 nucleotides, and 
in the 3 ' untranslated region which contained two 1-nucleotide deletions and a 1- 
nucleotide insertion with respect to the consensus Sindbis AR339 genome. The 
insertions found in the nsP3 gene of Girdwood S.A. were identical in position and 
content to those found. in S.A.AR86, although Girdwood S.A. did not have the 
large nsP3 deletion characteristic of S.A.AR86. The remaining portions of the 
genome contained the same number of nucleotides and predicted amino acids as 
Sindbis AR339. 

Overall, Girdwood S.A. was 94.5% identical to the consensus 
Sindbis AR339 sequence, differing at 655 nucleotides not including the insertions 
and deletions. These nucleotide differences resulted in 88 predicted amino acid 
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changes or a difference of 2.3 %. A plurality of amino acid differences were 
concentrated in the nsP3 gene, which contained 32 of the amino acid changes, 25 
of which were in the nonconserved 3' half, 

The Girdwood S.A. nucleotides at positions 1, 3, and 11,717 could 
not be resolved. Because the primer used during the RT-PCR amplification of the 
3' end of the genome assumed a cytosine in the 3' terminal position, the identity 
of this nucleotide could not be determined with certainty. However, in all 
alphaviruses sequenced to date there is a cytosine in this position. This, combined 
with the fact that no difficulty was encountered in obtaining RT-PCR product for 
this region with an oligo(dT) primer ending with a 3'G, suggested that Girdwood 
S.A. also contains a cytosine at this position. The ambiguity at nucleotide 
positions 1 and 3 resulted from strong stops encountered during the RNA 
sequencing. 

EXAMPLE 3 
Comparison of S.A.AR86 and Girdwood S.A 
Sequences With Other Sindbis-Related Virus Seg nonrpc 

Table 1 examines the relationship of S. A.AR86 and Girdwood S.A. 
to each other and to other Sindbis-related viruses. This was accomplished by 
aligning the nucleotide and .deduced amino acid sequences of Ockelbo82, AR339 
and Girdwood S.A. to those of S.A.AR86 and then calculating the percentage 
identity for each gene using the programs contained within the Wisconsin GCG 
package (Genetics Computer Group, 575 Science Drive, Madison WI 53711); as 
described in more detail in McKnight et al., /. Virol. 70, 1981-89 (1996). 

The analysis suggests that S.A.AR86 is most similar to the other 
South African isolate, Girdwood S.A. , and that the South African isolates are more 
similar to the Swedish Ockelbo82 isolate than to the Egyptian Sindbis AR339 
isolate. These results also suggest that it is unlikely that S.A.AR86 is a 
recombinant virus like WEE virus. Hahn et al., Proc. Natl. Acad. Sci. USA 85, 
5997-6001 (1988). 
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EXAMPLE 4 
Neuroyirulence of S.A.AR86 and Girdwood S.A. 
Girdwood S.A., Ockelbo82, and S.A.AR86 are related by sequence; 
in contrast, it has previously been reported that only S.A. AR86 displayed the adult 
5 mouse neurovirulence phenotype. Russell et al., J. Virol. 63, 1619-29 (1989). 
These findings were confirmed by the present investigations. Briefly, groups of 
four female CD-I mice (3-6 weeks of age) were inoculated ic with 10 3 plaque- 
forming units (PFU) of S.A.AR86, Girdwood S.A., or Ockelbo82. Neither 
Girdwood S.A. nor Ockelbo82 infection produced any clinical signs of infection. 
10 Infection with S.A. AR86 produced neurological signs within four to five days and 
ultimately killed 100% of the mice as previously demonstrated. 

Table 2 lists those amino acids of S.A.AR86 which might explain 
the neurovirulence phenotype in adult mice. A position was scored as potentially 
related to the S.A.AR86 adult neurovirulence phenotype if the S.A.AR86 amin o 
15 acid differed from that which otherwise was absolutely conserved at that position 
in the other viruses. 



TABLE 2 

Divergent Amino Acids in S.A.AR86 
Potentially Related to the Adult Neurovirulence Phenotype 





Position in 


S.A.AR86 


Conserved 




S.A.AR86 


Amino Acid 


Amino Acid 


nsP1 


583 


Thr 


lie 


nsP2 


256 


Arg 


Ala 




648 


lie 


Val 




651 


Lys 


Glu 


nsP3 


344 


Gly 


Glu 




386 


Tyr 


Ser 




441 


Asp 


Gly 




445 


lie 


Met 




537 


Cys 


Opal 


E2 


243 


Ser 


Leu 


6K 


30 


Val 


lie 


El 


112 


Val 


Ala 




169 


Leu 


Ser 
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EXAMPLE 5 
pS55 Molecular Clone of S. A.AR86 
As a first step in investigating the unique adult mouse 
neurovirulence phenotype of S.A.AR86, a full-length cDNA clone of the 
5 S. A. AR86 genome was constructed. The sources of cDNA included conventional 
cDNA clones (Davis et al., Virology 171, 189-204 (1989)) as well as uncloned 
RT-PCR fragments derived from the S . A. AR86 genome. As described previously, 
these were substituted, starting at the 3' end, into pTR5000 (McKnight et al., /. 
Virol 70, 1981-89 (1996)), a full-length Sindbis clone from which infectious 
10 genomic replicas could be derived by transcription with SP6 polymerase in vitro. 

The end result was pS55, a molecular clone of S.A.AR86 from 
which infectious transcripts could be produced and which contained four nucleotide 
changes (G for A at nt 215; G for C at nt 3863; G for A at nt 5984; and C for T 
at nt 9113) but no amino acid coding differences with respect to the S.A.AR86 
15 genomic RNA (amino acid sequence of S. A. AR86 presented in Figure 2 (SEQ ID 
NO:2 and SEQ ID NO:3)). The nucleotide sequence of clone pS55 is presented 
in Figure 5 (SEQ ID NO:7). 

As has been described by Simpson et al., Virology 222, 464-69 
(1996), neurovirulence and replication of the virus derived from pS55 (S55) were 

20 compared with those of S.A.AR86. It was found that S55 exhibits the distinctive 
adult neurovirulence characteristic of S.A.AR86. Like S.A.AR86, S55 produces 
100% mortality in adult mice infected with the virus and the survival times of 
animals infected -with both viruses were indistinguishable. In addition, S55 and 
S.A.AR86 were found to replicate to essentially equivalent titers in vivo, and the 

25 profiles of S55 and S.A.AR86 virus growth in the central nervous system and 
periphery were very similar. 

From these data it was concluded that the silent changes found in 
virus derived from clone pS55 had little or no effect on its growth or virulence, 
and that this molecularly cloned virus accurately represents the biological isolate, 
30 S.A.AR86. 
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EXAMPLE 6 
Construction of the Consensus AR339 Virus TR339 
The consensus sequence of the Sindbis virus AR339 isolate, the 
prototype alphavirus was deduced. The consensus AR339 sequence was inferred 
5 by comparison of the TRSB sequence (a laboratory-derived AR339 strain) with the 
complete or partial sequences of HR, P (the Gen Bank sequence; Strauss et al., 
Virology 133, 92-110 (1984)), SV1 A, and NSV (AR339-derived laboratory strains; 
Lustig et al., /. Virol 62, 2329-36 (1988)), and SIN (a laboratory-derived AR339 
strain; Davis et al., Virology 161, 101-108 (1987), Strauss et al., /. Virol. 65, 
10 4654-64 (1991)). Each of these viruses was descended from AR339. Where these 
sequences differed from each other, they also were compared with the amino acid 
sequences of other viruses related to Sindbis virus: Ockelbo82, S.A.AJR86, 
Girdwood S.A., and the somewhat more distantly related Aura virus. Rumenapf 
et al., Virology 208, 621-33 (1995). 



15 The details of determining a consensus AR339 sequence and 

constructing the consensus virus TR339 have been described elsewhere. McKnight 
et al., /. Virol 70, 1981-89 (1996); Klimstra et al., manuscript in preparation. 
The nucleotide (SEQ ID NO:8) sequence of pTR339 is presented in Figure 6. 
The deduced amino acid sequences of the pTR339 non-structural and structural 

20 polyproteins are shown as SEQ ID NO:9 and SEQ ID NO:10, respectively. The 
asterisk at position 1897 in SEQ ID NO:9 indicates the position of the opal 
termination codon in the coding region of the nonstructural polyprotein. The 
consensus nucleotide sequence diverged from the pTRSB sequence at three coding 
positions (nsP3 528, E2 1, and El 72). These differences are illustrated in Table 

25 3. 



TABLE 3 

Amino Acid Differences Between 
Laboratory Strain TRSB and Molecular Clone TR339 





nsP3 528 (nt5683) 


E2 1 (nt8633) 


E1 72 (nt10279) 


TR339 


Arg (CGA) 


Ser (AGO 


Ala (GCU) 


TRSB 


Gin (CAA) 


Arg (AGA) 


Val (GUI!) 
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EXAMPLE 7 
Animals Used for In Vivo Localization Studies 
Specific pathogen free CD-I mice were obtained from Charles River 
Breeding Laboratories (Raleigh, North Carolina) at 21 days of age and maintained 
under barrier conditions until approximately 37 days of age. Intracerebral (ic) 
inoculations were performed as previously described, Simpson et al. s ViroL 222, 
464-49 (1996), with 500 PFU of S51 (an attenuated mutant of S55) or 10 3 PFU of 
S55. Animals inoculated peripherally were first anesthetized with METOFANE®. 
Then, 25 p\ of diluent (PBS, pH 7.2, 1% donor calf serum, 100 u/ml penicillin, 
50 /xg/ml streptomycin, 0.9 mM CaCl 2 , and 0.5 mM MgCy containing 10 3 PFU 
of virus were injected either intravenously (iv) into the tail vein, subcutaneousiy 
(sc) into the skin above the shoulder blades on the middle of the back, or 
intraperitoneally (ip) in the lower right abdomen. Animals were sacrificed at 
various times post-inoculation as previously described. Simpson et al. , ViroL 222, 
464-49 (1996). Brains (including brainstems) were homogenized in diluent to 30% 
w/v, and right quadriceps were homogenized in diluent to 25 % w/v. Homogenates 
were handled and titered as described previously. Simpson et al., ViroL 222, 464- 
49 (1996). Bone marrow was harvested by crushing both femurs from each animal 
in sufficient diluent to produce a 30% w/v suspension (calculated as weight of 
uncrushed femurs in volume of diluent). Samples were stored at -70° C. For 
titration, samples were thawed and clarified by centrifiigation at 1,000 x g for 20 
minutes at 4°C before being titered by conventional plaque assay on BHK-21 cells. 

EXAMPLE 8 
Tissue Preparation for In Situ Hybridization Studies 
Animals were anesthetized by ip injection of 0.5 ml AVERTIN® at 
various times post-inoculation followed by perfusion with 60 to 75 ml of 4% 
paraformaldehyde in PBS (pH 7.2) at a flow rate of 10 ml per minute. The entire 
carcass was decalcified for 8 to 10 weeks in 4% parafomaldehyde containing 8% 
EDTA in PBS (pH 6.8) at 4°C. This solution was changed twice during the 
decalcification period. Selected tissues were cut into blocks approximately 3 mm 
thick and placed into biopsy cassettes for paraffin embedding and sectioning. 
Blocks were embedded, sectioned and hematoxylin/eosin stained by Experimental 
Pathology Laboratories (Research Triangle Park, North Carolina) or North 
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Carolina State University Veterinary School Pathology Laboratory (Raleigh, North 
Carolina). 

EXAMPLE 9 
In Situ Hybridization 
Hybridizations were performed using a [ 35 S]-UTP labeled S.A.AR86 
specific riboprobe derived from pDS-45. Clone pDS-45 was constructed by first 
amplifying a 707 base pair fragment from pS55 by PCR using primers 7241 (5'- 
CTGCGGCGGATTCATCTTGC-3 ' , SEQ ED NOrll) and SC-3 (5'- 
CTCCAACTTAAGTG-3 ' , SEQ ID NO:12). The resulting 707 base pair fragment 
was purified using a GENE CLEAN® kit (BiolOl, CA), digested with Hhal, and 
cloned into the Smal site of pSP72 (Promega). Linearizing pDS-45 with EcoRV 
and performing an in vitro transcription reaction with SP6 DNA-dependent, RNA 
polymerase (Promega) in the presence of pSJ-UTP resulted in a riboprobe 
approximately 500 nucleotides in length of which 445 nucleotides were 
complementary to the S.A.AR86 genome (nucleotides 7371 through 7816). A 
riboprobe specific for the influenza strain PR-8 hemagglutinin (HA) gene was used 
as a control probe to test non-specific binding. The in situ hybridizations were 
performed as described previously (Charles etal., Virol 208, 662-71 (1995)) using 
10 5 cpm of probe per slide. 

EXAMPLE 10 
Replication of S.A.AR86 in Bone Marrow 
Three groups of six adult mice each were inoculated peripherally 
(sc, ip, or iv) with 1200 PFU of S55 (a molecular clone of S. A.AR86) in 25 yX 
of diluent. Under these conditions, the infection produced no morbidity or 
mortality. Two mice from each group were anesthetized and sacrificed at 2, 4 and 
6 days post-inoculation by exsanguination. The serum, brain (including 
brainstem), right quadricep, and both femurs were harvested and titered by plaque 
assay. Virus was never detected in the quadricep samples of animals inoculated 
sc (Table 4). A single animal inoculated ip (two days post-inoculation) and two 
mice inoculated iv (at four and six days post-inoculation) had detectable virus in 
the right quadricep, but the titer was at or just above the limit of detection (6.25 
PFU/e tissue). Virus was present sporadically or at low levels in the brain and 
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serum of animals regardless of the route of inoculation. Vims was detected in the 
bone marrow of animals regardless of the route of inoculation. However, the 
presence of virus in bone marrow of animals inoculated sc or ip was more sporadic 
than animals inoculated iv, where five out of six animals had detectable virus, 
5 These results suggest that S55 targets to the bone marrow, especially following iv 
inoculation. 

The level and frequency of virus detected in the serum and muscle 
suggested that virus detected in the bone marrow was not residual virus 
contamination from blood or connective tissue remaining in bone marrow samples. 

10 The following experiment also suggested that virus in bone marrow was not due 
to tissue or serum contamination. Mice were inoculated ic with 1200 PFU of S55 
in 25 pi of diluent. Animals were sacrificed at 0.25, 0.5, 1, 1.5, 2, 3, 4, 5, and 
6 days post-inoculation, and the carcasses were decalcified as described in 
Example 8. Coronal sections taken at approximately 3 mm intervals through the 

15 head, spine (including shoulder area), and hips were probed with an S55-specific 
[ 35 S]-UTP labeled riboprobe derived from pDS-45. Positive in situ hybridization 
signal was detected by one day post-inoculation in the bone marrow of the skull 
(data not shown). Weak signal also was present in some of the chondrocytes of 
the vertebrae, suggesting that S55 was replicating in these cells as well. Although 

20 the frequency of positive bone marrow cells was low, the signal was very intense 
over individual positive cells. This result strongly suggests that S55 replicates in 
vivo in a subset of cells contained in the bone marrow. 

EXAMPLE 11 
Other Sindbis Group Viruses 
25 It was of interest to determine if the ability to replicate in the bone 

marrow of mice was unique to S55 or was a general feature of other viruses, both 
Sindbis and non-Sindbis viruses, in the Sindbis group. Six 38-day-old female CD- 
1 mice were inoculated iv with 25 pX of diluent containing 10 3 PFU of S55, 
Ockelbo82, Girdwood S.A., TR339, or TRSB. At 2, 4 and 6 days post- 
30 inoculation two mice from each group were sacrificed and whole blood, serum, 
brain (including brainstem), right quadricep, and both femurs were harvested for 
virus titration. 
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The results of this experiment were similar to those with S55. 
TRSB infected animals had no virus detectable in serum or whole blood in any 
animal at any time, and with the other viruses tested, no virus was detected in the 
serum or whole blood of any animal beyond two days post-inoculation (detection 
5 limit, 25 PFU/ml). Neither TRSB nor TR339 was detectable in the brains of 
infected an i m a l s at any time post-inoculation. S55, Girdwood S.A., and 
Ockelbo82 were present in the brains of infected animals sporadically with the 
titers being at or near the 75 PFU/g level of detection. All the tested viruses were 
found sporadically at or slightly above the 50 PFU/g detection limit in the right 
10 quadricep of infected animals except for a single animal four days post-inoculation 
with TRSB which had nearly 10 5 PFU/g of virus in its quadricep. . 

The frequency at which the different viruses were detected in bone 
marrow varied widely, with S55 and Girdwood S.A. being the most frequently 
isolated (five out of six animals) and Ockelbo82 and TRSB being the least 
15 frequently isolated from bone marrow (one out of six animals and two out of six 
animals, respectively) (Table 4). Girdwood S.A. and S55 gave nearly identical 
profiles in all tissues. Girdwood S.A., unlike S.A.AR86, is not neurovirulent in 
adult mice (Example 4), suggesting that the adult neurovirulence phenotype is 
distinct from the ability of the virus to replicate efficiently in bone marrow. 
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EXAMPLE 12 
Virus Persistence in Bone Marrow 
The next step in our investigations was to evaluate the possibility 
that S.A.AR86 persisted long-term in bone marrow. S51 is a molecularly cloned, 
attenuated mutant of S55. S51 differs from S55 by a threonine for isoleucine 
5 substitution at amino acid residue 538 of nsPl and is attenuated in adult mice 
inoculated intracerebrally. Like S55, S51 targeted to and replicated in the bone 
marrow of 37-day-old female CD-I mice following ic inoculation. Mice were 
inoculated ic with 500 PFU of S51 and sacrificed at 4, 8, 16, and 30 days post- 
inoculation for determination of bone marrow and serum titers. At no time post- 
10 inoculation was virus detected in the serum above the 6.25 PFU/ml detection limi t 
Virus was detectable in the bone marrow samples of both animals sampled at four 
days post-inoculation and in one animal eight days post-inoculation (Table 5). No 
virus was detectable by titration on BHK-21 cells in any of the bone marrow 
samples beyond eight days post-inoculation. These results suggested that the 
15 attenuating mutation present in S5 1 , which reduces the neurovirulence of the virus, 
did not impair acute viral replication in the bone marrow. 

It was notable that the plaque size, on BHK-21 cells of virus 
recovered on day 4 post-inoculation was smaller than the size of plaques produced 
by the inoculum virus, and that plaque? produced from virus recovered from the 
20 day 8 post-inoculation samples were even smaller and barely visible. This 
suggests a strong selective pressure in the bone marrow for virus that is much less 
efficient 'in forming plaques on BHK-21 cells. 



To demonstrate that S51 virus genomes were present in bone 
marrow cells long after acute infection, four to six- week-old female CD-I mice 

25 were inoculated ic with 500 PFU of S51. Three months post-inoculation two 
animals were sacrificed, perfused with paraformaldehyde and decalcified as 
described in Example 8. The heads and hind limbs from these animals were 
paraffin embedded, sectioned, and probed with a S.A.AR86 specific p 5 S]-UTP 
labeled riboprobe derived from clone pDS-45. In situ hybridization signal was 

30 clearly present in discrete cells of the bone and bone marrow of the legs (data not 
shown). Furthermore, no in situ hybridization signal was detected in an adjacent 
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control section probed with an influenza virus HA gene specific riboprobe. As the 
relative sensitivity of in situ hybridization is reduced in decalcified tissues (Peter 
Charles, personal communication), these cells likely contain a relatively high 
number of viral sequences, even at three months post-inoculation. No in situ 
5 hybridization signal was observed in mid-sagital sections of the heads with the 
S.A.AR86 specific probe, although focal lesions were observed in the br ain 
indicative of the prior acute infection with S51. 



TABLE 5 



S51 Titers in Bone Marrow Following IC Inoculation of 500 PFTT 


j Days Post- 
Inoculation 


Titers (Total PFU/AnimaD 


Limit of 
Detection 


Animal A 


Animal B 


4 


2100 


380 


62.5 


8 


62.5 


N.D. 8 


62.5 


16 


N.D. 


N.D. 


62.5 


30 


N.D. 


N.D. 


62.5 



1 "N.D." indicates that the virus titers were below the limit of detection. 
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Example 13 

Replication of S.A.A.R 86 within Bone/Joint Tissue of Adult Mice 

Several old world alphaviruses, including Ross River Virus, 
Chikungunya virus, Okelbo82, and S.A.AR86 are associated with acute and persistent 
5 arthritis/arthralgia in humans. Molecular clones of several Sindbis group viruses, 
including S.A.AR86,. were used to investigate alphavirus replication within bone/joint 
tissue. 

Following intravenous inoculation of S.A.AR86 into adult CD-I mice, 
viral replication was observed in bone/joint tissue, but not surrounding muscle tissue of 

10 the hind limbs. Infectious virus was detectable 24 hrs post-infection; however, viral 
titer within bone/joint tissue was maximal 72 hours post-infection. Fractionation of 
hind limbs from infected animals revealed that the hip and knee joints were the 
predo minant sites of viral replication. Replication within bone/joint tissue appears to be 
a common trait of Sindbis-group viruses, since the laboratory strains TR339 and TRSB 

15 also replicated within bone/joint tissue. In situ hybridization and S.A.AR86 based 
double promoter vectors expressing green fluorescent protein were used to further 
localize S.A.AR86 infected cells within bone/joint tissue. Green fluorescent protein 
expression was detected in bone/joint tissue for at least one month post-inoculation. 
These studies demonstrated that cells within the endosteum of synovial joints were the 

20 predominant site of S.AAR86 replication. 
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THAT WHICH IS CLAIMED IS: 

1. A method of introducing and expressing heterologous RNA 
in bone marrow cells, comprising: 

(a) providing a recombinant alphavirus, said alphavirus 
containing a heterologous RNA segment, said heterologous RNA segment 
comprising a promoter operable in said bone marrow cells operatively associated 
with a heterologous RNA to be expressed in said bone marrow cells; and then 

(b) contacting said recombinant alphavirus to said bone marrow 
cells so that said heterologous RNA segment is introduced and expressed therein. 

2. A method according to claim 1 , wherein said contacting step 
is carried out in vitro. 

3. A method according to claim 1 , wherein said contacting step 
is carried out in vivo in a subject in need of such treatment. 

4. A method according to claim 1, wherein said heterologous 
RNA encodes a protein or peptide. 

5. A method according to claim 1, wherein said heterologous 
RNA encodes an immunogenic protein or peptide, 

6. A method according to claim 1, wherein said heterologous 
RNA encodes an antisense oligonucleotide or a ribozyme. 

7. A method according to claim 1, wherein said alphavirus is 
an Old World alphavirus. 

8. A method according to claim 1, wherein said alphavirus is 
selected from the group consisting of SF group and SIN group alphaviruses. 
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9. A method according to claim 1, wherein said alphavirus is 
selected from the group consisting of Semliki Forest virus, Middelburg vims, 
Chikungunya virus, O'Nyong-Nyong virus, Ross River virus, Bannah Forest 
virus, Getah virus, Sagiyama virus, Bebaiu virus, Mayaro virus, Una virus, 

5 Sindbis virus, South African Arbovirus No. 86, Ockelbo virus, Girdwood S.A. 
virus, Aura virus, Whataroa virus, Babanki virus, and Kyzylagach virus. 

10. A method according to claim 1, wherein said alphavirus is 
South African Arbovirus No. 86. 

11. A method according to claim 1, wherein said alphavirus is 
10 Girdwood S.A. 

12. A method according to claim 1, wherein said alphavirus is 
Sindbis strain TR339. 

13. A helper cell for expressing an infectious, propagation 
defective, Girdwood S.A. virus particle, comprising, in a Girdwood S.A.- 

15 permissive cell: 

(a) a first helper RNA encoding (i) at least one Girdwood S.A. 
structural protein, and (ii) not encoding at least one other Girdwood S.A. structural 
protein; and 

(b) a second helper RNA separate from said first helper RNA, 
20 said second helper RNA (i) not encoding said at least one Girdwood S.A.. 

structural protein encoded by said first helper RNA, and (ii) encoding said at least 
one other Girdwood S.A. structural protein not encoded by said first helper RNA, 
and with all of said Girdwood S.A. structural proteins encoded by said first and 
second helper RNAs assembling together into Girdwood S.A. particles in said cell 
25 containing said replicon RNA; 

and wherein the Girdwood S.A. packaging segment is deleted from 
at least said first helper RNA. 
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14. The helper cell according to claim 13, further containing a 

replicon RNA; 

said replicon RNA encoding said Girdwood S. A. packaging segment 
and an inserted heterologous RNA; 

wherein said Girdwood S.A. packaging segment is deleted from at 
least one of said helper RNA; 

and wherein said replicon RNA, said first helper RNA, and said 
second helper RNA are all separate molecules from one another. 

15. The helper cell according to claim 13, further containing a 

replicon RNA; 

said replicon RNA encoding said Girdwood S.A. packaging segment 
and an inserted heterologous RNA; 

wherein said replicon RNA and said first helper RNA are separate 

molecules; 

and wherein the molecule containing said replicon RNA further 
contains RNA encoding said at least one Girdwood S.A. structural protein not 
encoded by said first, helper RNA. 

16. The helper cell according to claim 13, wherein said first 
helper RNA encodes both the Girdwood S.A. El glycoprotein and the Girdwood 
S.A. E2 glycoprotein, and wherein said second helper RNA encodes the Girdwood 
S.A. capsid protein. 

17. A method of making infectious, propagation defective, 
Girdwood S.A. virus particles, comprising: 

transfecting a Girdwood S.A.-permissive cell according to claim 13 
with a propagation defective replicon RNA, said replicon RNA including said 
Girdwood S.A. packaging segment and an inserted heterologous RNA; 

producing said Girdwood S.A. virus particles in said transfected 

cell; and then 

collecting said Girdwood S.A. virus particles from said cell. 
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18. Infectious Girdwood S.A. virus particles produced by the 
method of Claim 17. 

19. Infectious Girdwood S.A. virus particles reg ainin g a 
replicon RNA encoding a promoter, an inserted heterologous RNA, and wherein 

5 RNA encoding at least one Girdwood S.A. structural protein is deleted therefrom 

so that said Girdwood S.A. virus particle is propagation defective. 

20. A pharmaceutical formulation comprising infectious 
Girdwood S.A. virus particles according to claim 18 or 19 in a pharmaceutical^ 
acceptable carrier. 

21. A helper cell for expressing an infectious, propagation 
defective, TR339 virus particle, comprising, in a TR339-permissive cell: 

(a) a first helper RNA encoding (i) at least one TR339 structural 
protein, and (ii) not encoding at least one other TR339 structural protein; and 

(b) a second helper RNA separate from said first helper RNA, 
said second helper RNA (i) not encoding said at least one TR339 structural protein 
encoded by said first helper RNA, and (ii) encoding said at least one other TR339 
structural protein not encoded by said first helper RNA, and with all of said 
TR339 structural proteins encoded by said first and second helper RNAs 
assembling together into TR339 particles in said cell containing said replicon 
RNA; 

and wherein the TR339 packaging segment is deleted from at least 
said first helper RNA. 

22. The helper cell according to claim 21, further containing a 
replicon RNA; 

25 said replicon RNA encoding said TR339 packaging segment and an 

inserted heterologous RNA; 

wherein said TR339 packaging segment is deleted from at least one 
of said helper RNA; 

and wherein said replicon RNA, said first helper RNA, and said 
30 second helper RNA are all separate molecules from one another. 
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23. The helper cell according to claim 21, further containing a 
replicon RNA; 

said replicon RNA encoding said TR339 packaging segment and an 
inserted heterologous RNA; 
5 wherein said replicon RNA and said first helper RNA are separate 

molecules; 

and wherein the molecule containing said replicon RNA further 
contains RNA encoding said at least one TR339 structural protein not encoded by 
said first helper RNA. 

10 24. The helper cell according to claim 21, wherein said first 

helper RNA encodes both the TR339 El glycoprotein and the TR339 E2 
glycoprotein, and wherein said second helper RNA encodes the TR339 capsid 
protein. 

25. A method of making infectious, propagation defective, 
TR339 virus particles, comprising: 

transfecting a TR339-permissive cell according to claim 21 with a 
propagation defective replicon RNA, said replicon RNA including said TR339 
packaging segment and an inserted heterologous RNA; 

producing said TR339 virus particles in said transfected cell; and 

then 

collecting said TR339 virus particles from said cell. 

26. Infectious TR339 virus particles produced by the method of 

Claim 25. 

27. Infectious TR339 virus panicles containing a replicon RNA 
25 encoding a promoter, an inserted heterologous RNA, and wherein RNA encoding 

at least one TR339 structural protein is deleted therefrom so that said virus particle 
is propagation defective. 

28. A pharmaceutical formulation comprising infectious TR339 
virus panicles according to Claim 26 or 27 in a pharmaceutical^ acceptable carrier. 
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29. A recombinant UNA comprising a cDNA coding for an 
infectious Girdwood S.A. virus RNA transcript and a heterologous promoter 
positioned upstream from said cDNA and operatively associated therewith. 

30. An infectious RNA transcript encoded by a cDNA according 

5 to claim 29. 

31. An infectious RNA according to claim 30, said infectious 
Girdwood S.A. RNA transcript containing a heterologous RNA segment, said 
heterologous RNA segment comprising a promoter operably associated with a 
heterologous RNA. 

10 32. Infectious viral particles containing an RNA transcript 

according to claim 30. 

33. A recombinant DNA comprising a cDNA coding for a 
Sindbis strain TR339 RNA transcript and a heterologous promoter positioned 
upstream from said cDNA and operatively associated therewith. 

15 34. An infectious RNA transcript encoded by a cDNA according 

to claim 33. 

35. An infectious RNA according to claim 34, said infectious 
Girdwood S.A. RNA transcript containing a heterologous RNA segment, said 
heterologous RNA segment comprising a promoter operably associated with a 

20 heterologous RNA. 

36. Infectious viral particles containing an RNA transcript 
according to claim 34. 
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Nucleotide Sequence of S.A.AR86 

l ATTGCCCCCC TACTACACAC tattcaatca AACACCCCAC caattccact accatcacaa tccacaaccc actacttaac CTACACCTAC accctcagag 

101 TCCGTTTGTC GTGCAACTGC AAAAGAGCTT CCCGCAATTT GAGGTAGTAG CACAGCAGGT CACTCCAAAT GACCATGCTA ATGCCAGAGC ATTTTCGCAT 
201 CTGGCCAGTA AACTAATCGA GCTCGAGGTT CCTACCACAG CGAOGATTTT GGACATAGGC AGCGCACCGG CTCGTAGAAT GTTTrCCGAC CACCAGTACC 
»l ATTGCGTTTO CCCCATGCGT AGTCCAGAAG ACCCGGACCG CATGATGAAA TATGCCAGCA AACTCCCGGA AAAAGCATGT AAGATTACAA ACAAGAACTT 
401 GCATGAGAAG ATCAAGGACC TCCGGACCGT ACTTGATACA CCGGATGCTG AAACGCCATC ACTCTGCTTC CACAACGATG TTACCTGCAA CACGCGTGCC 
501 GAGTACTCCG TCATGCAGGA CGTGTACATC AACGCTCCCG gaactattta CCACCAGCCT ATGAAAGCCG tgcggaccct gtactgcatt cgcttcgaca 
601 CCACCCAGTT CATGTTCTCG GCTATGGCAG CTTCCTA CCC TGCATACAAC ACCAACTGCG CCGACGAAAA AGTCCTTGAA GCGCGTAACA TCGOACTCT G 
701 CAGCACAAAG CTGAGTGAAG GCAGGACACG AAAGTTCTCG ATAATGAGGA AGAAGGAGTT GAAGCCCCGG TCACGCGTTT ATTTCTCCGT TGGATCGACA 
801 CTTTACCCAG AACACAGACC CACCTTGCAQ ACCTCGCATC TTCCATCGGT CTTCCACTTG AAAGGAAAGC AGTCGTACAC TTGCCGCTGT GATACAGTCG 
901 TGAGCTGCGA AGGCTACGTA GTGAAGAAAA TCACCATCAG TCCCGCGATC ACGGGAGAAA CCGTGGGATA CGCCGTTACA AACAATACCG AGGGCTTCTT 
1001 GCTATCCAAA GTTACCGATA CACTAAAACG AGAACGCGTA TCGTT C CCCG TCTGCACCTA TATCCCGGCC ACCATATGCG ATCAGATGAC CGCCATAATG 
1101 GCCACGGATA TCTCACCTGA CGATCCACAA AAACTTCTGG TTGGGCTCAA CCAGCGAATC GTCATTAACG GTAAGACTAA CACGAACACC AATACCATGC 
1201 AAAATTACCT TCTCCCAATC ATTGCACAAG GGTTCAGCAA ATGGGCCAAG GAGCGCAAAG AAGATCTTGA CAATGAAAAA ATGCTGGGCA CCAGAGAGCG 
1301 CAAGCTTACA TATGGCTGCT TGTGCGCGTT TCGCACTAAG AAAGTGCACT CGTTCTATCG CCCACCTGGA ACCCAGACCA TCGTAAAAGT CCCAGCCTCT 
1401 TTTAGCGCTT TCCCCATGTC ATCCGTATGG ACTACCTCTT TGCCCATGTC GCTGAGGCAG AAGATGAAAT TGGCATTACA ACCAAAGAAG GAGGAAAAAC 
1501 TGCTCCAAGT CCCGGAGGAA TTAGTTATGG AGCCCAAGGC TGCTTTCGAG CATGCTCAGG AGGAATCCAG AGCGCAGAAG CTCCGAGAAG CACTCCCACC 
1601 ATTAGTGCCA GACAAAGGTA TCGAGGCAGC TGCGGAAGTT GTCTGCGAAG TGGAGGGGCT CCAGGCGGAC ACCGGAGCAG CACTCGTCGA AACCCCGCCC 
1701 GGTCATGTAA GGATAATACC TCAAGCAAAT GACCGTATGA TCGGACAGTA TATCGTTGTC TCGCCGATCT CTGTGCTGAA GAACGCTAAA CTCGCACCAG 
tttl CACACCCGCT AGCAGACCAG gttaagatca TAACGCACTC CGGAAGATCA ggaaggtatg cagtcgaacc atacgaccct aaagtactga tcccaccacg 
1901 AAGTGCCGTA CCATGGCCAO AATTCTTAGC ACTGAGTGAG AGCGCCACGC TTGTGTACAA CGAAAGAGAG TTTGTGAACC GCAAGCTGTA CCATATTGCC 
2001 ATGCACGGTC CCCCTAAGAA TACAGAAGAG GAGCAGTACA AGGTTACAAA GGCAGAGCTC CCAGAAACAG AGTACG rOTT TGACGTGGAC AAGAAGCGAT 
2101 GCGTTAAGAA ggaagaagcc TCAGGACTTG tcctttcggg AGAACTGACC AACCCCCCCT ATCACGAACT AG L" I CI I GAG ggactgaaga ctcgacccgc 
2201 GGTCCCGTAC AAGGTFGAAA CAATAGGAGT GATAGGCACA CCAGGATCGG GCAAGTCAGC TATCATCAAG TCAACTGTCA CCGCACGTGA TCTTGTTACC 
2301 AGCGGAAAGA AAGAAAACTG CCGCGAAATT GAGGCCGACG TGCTACGGCT GAGGGGCATG CAGATCACGT CGAAGACAGT GCATTCGGTT ATGCTCAACG 
2401 GATGCCACAA AGCCGTAGAA GTGCTGTATG TTGACGAACC GTTCCGGTGC CACGCAGGAG CACTACTTGC CTTGATTGCA ATCGTCAGAC CCCCTAAGAA 
2501 GGTAGTACTA TGCGGAGACC CTAAGCAATG CCGATTCTTC AACATGATGC AACTAAAGGT ACATTTCAAC CACCCTGAAA AAGACATATG TACCAAGACA 
2601 TTCTACAAGT TTATCTCCCG ACGTTGCACA CACCCAGTCA CGGCTATTGT ATCGACACTG CATTACGATG GAAAAATGAA AACCACAAAC CCGTGCAAGA 
2701 AGAACATCGA AATCGACATT ACAGGGGCCA CGAAGCCGAA GCCAGGGGAC ATCATCCTGA CATGTTTCCG CCGGTGGGTT AAGCAACTCC AAATCGACTA 
2801 TCCCGGACAT GAGGTAATGA CAGCCGCGCC CTCACAAGGG CTAACCAGAA AAGGAGTATA TGCCGTCCCG CAAAAAGTCA ATGAAAACCC GCTGTACGCG 
2901 ATCACATCAG AGCATGTGAA CGTCTTGCTC ACCCGCACTG AGGACAGGCT AGTATGGAAA ACTTTACAGG GCGACCCATG GATTAAGCAG CTCACTAACG 
3001 TACCTAAAGG AAATTTTCAC GCCACCATCG AGGACTCGGA AGCTGAACAC AAGGOAATAA TTGCTGCGAT AAACAGTCCC GCTCCCCGTA CCAATCCGTT 
3101 CAGCTGCAAG ACTAACGTTT GCTGCGCGAA AGCACTGGAA CCGATACTGG CCACGGCCGG TATCGTACTT ACCGGTTGCC AGTGGaGCGA GCTGTTCCCA 
3201 CAGTTTGCGG ATGACAAACC ACACTCGGCC ATCTACGCCT TAGACGTAAT TTGCATTAAG TTTTTCGGCA TGGACTTGAC AAGCGGCCTG TTTTCCAAAC 
3301 AGAGCATCCC GTTAACGTAC CATCCTCCCG ACTCAGCGAG GCCAGTACCT CATTGGGACA ACAGCCCAGG AACACGCAAG TATGGGTACG ATCACGCCGT 

3401 tgccgccgaa ctctccccta gatttccggt gttccagcta gctgggaaag gcacacaccttgatttgcag acgggcagaa ctagagttat ctctgcacag 

3501 CATAACTTGG TCCCACTGAA CCGCAATCTC CCTCACGCCT TAGTCCCCGA GCACAAGGAG AAACAACCCG GCCCGGTCGa AAAATTCTTG ACCCAGTTCA 
3601 AACACCACTC CGTACTTGTG atctcagaga AAAAAATTGA ACCTCCCCAC AAGAGAATCG AATGGATCGC CCCGATTCCC atagccggcg CAGATAAGAA 
3701 CTACAACCTG GCTTTCGCGT TTCCGCCGCA GGCACGGTAC GACCTGCTGT TCATCAATAT TCGAACTAAA TACAGAAACC ATCACTTTCA ACAGTGCGAA 
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3801 GACCACCCCC CCACCTTGAA aaccctttcg cgttcggccc tcaactccct taacccccca cccaccctcc tcgtcaagtc ctacggttac ccccacccca 

3901 ATAGTGAGGA CGTAGTCACC CCTCTTCCCA GAAAATTTGT CACAGTCT C T CCACCCACCC CAGACTCCCT CTCAACCAAT ACAGAAATCT ACCTCATTTT 
4001 CCCACAACTA CACAACACCC CCACACCACA ATTCACCCCC CATCATTTGA ATTCTCTC AT 1 1CU I CCC T G TACCACCCTA CAAGACACCO ACTTCCACCC 
4101 CCACCGTCCT ACCCTACTAA AACGCACAAC ATTGCTGATT GTCAAGAGGA ACCAGTTCTC AATCCACCCA ATCCACTCCG CAGACCACGA GAAGGACTCT 
4701 CCCCTCCCAT CTATAAACGT TCCCCGAACA CTTTCACCCA TTCAGCCACA CAGACACCTA CCGCAAAACT GACTGTGTGC CAAGGAAAGA AAGTGATCCA 
4301 CCCGGTTGCC CCTGATTTXC CGAAACACCC AGAGGCAGAA G CC CT G AAAT TGCTCCAAAA CGCCTACCAT GCAGTGGCAG ACTTAGTAAA TOAACATAAT 
■U01 ATCAA G TCT G TCGCCATCCC ACTGCTATCT ACAGGCATTT ACGCAGCCGG AAAAGACCGC CTTGAGGTAT CACTTAACTG CTTGACAACC GCGCTAGACA 
4501 GAACTGATGC GGACGTAACC ATCTACTCCC TGGATAAGAA GTGGAAGGAA AGAATCGACG CGG'T GC TC CA ACTTAAGGAG TCTGTAACTG AGCTGAAGGA 
4601 TGAGGATATG GACATCGACG ACCAGTTAGT ATGGATCCAT CCGGACACTT GCCTGAAGCG AAGAAAGGGA TTCAGTACTA CAAAAGGAAA GTTGTATTCG 

4701 TACTTTGAAG GCACCAAATT ccatcaagca gcaaaagata tggccgagat aaaggtcctg ttcccaaatg accaggaaag caacgaacaa ctgtgtccct 

4601 ACATATTGCG GGACACCATG GAAGCAATCC GCGAAAAATG CCCGGTCCAC CACAACCCCT CGTCTAGCCC GCCAAAAACG CTGCCGTGCC TCTGTATGTA 
4901 TGCCATGACG CCAGAAAGGG TCCACAGACT CAGAAGCAAT AACGTCAAAG AAGTTACAGT ATGCTCCTCC ACCCCCCTTC CAAAGTACAA AATCAAGAAT 
5001 CTTCAGAACG TTCAGTGCAC AAAAGTAGTC CTGTTTAACC CGCATACCCC CGCATTCCTT CCCGCCCGTA AGTACATAGA AGCACCAGAA CAGCCTCCAG 
5101 CTCCGCCTGC ACAGGCCGAG GAGGCCCCCG GAGTTGTAGC GACACCAACA CCACCTGCAG CTGATAACAC CTCGCTTGAT GTCACGGACA TCTCACTGCA 
5201 CATGGAAGAC AGTAGCGAAG GCTCACTC7T TTCGAGCTTT AGCGGATCGG ACAACTACCG AAGGCAGGTG GTCGTGGCTG ACGTCCATGC CGTCCAAGAG 
5301 CCTGCCCCTC TTCCACCGCC AAGGCTAAAG AAGATGGCCC GCCTGGCAGC GGCAAGAATG CAGGAAGACC CAACTCCACC GGCAAGCACC AGCTCTGCGG 
5401 ACGAGTCCCT TCACCTTTCT TTTGATGGGG TATCTATATC CTTCGCATCC CTTTTCCACG GAGAGATGGC CCGCTTGGCA GCGGCACAAC CCCCGGCAAG 
5501 TACATGCCCT ACGGATGTCC CTATGTCTTT CGGATCGTtT TCCGACGGAG AGATTGAGGA GTTCAGCCGC AGAGTAACCG AGTCGGAGCC CGTCCTGTTT 
5601 GGGTCATTTG AACCGGCCCA AGTGAACTCA ATTATATCGT CCCGATCAGC CCTATCTTTT CCACCACGCA ACCAGAGACG TAGACGCAGG AGCAGOAGGA 

5701 CCCAATACTG tctaaccggg gtagctgggt acatattttc gacggacaca ggccctgccc acttgcaaaa GAAGTCCGTT ctgcagaacc AGCTTACAGA 

5801 ACCGACCTTG GAGCGCAATG TTCTGGAAAG AATCTACGCC CCGGTGCTCG ACACGTCGAA AGAGGAACAG CTCAAACTCA GGTACCAGAT GATGCCCACC 
5901 GAAGCCAACA AAAGCAGGTA CCAGTCTCGA AAA GT AC AAA ACCAOAAAGC CATAACCACT GAGCGACTGC TTTCAGGGCT ACGACTGTAT AACTCTGCCA 
6001 CAGATCAGCC AGAATGCTAT AAGATCACCT ACCCGAAACC ATCGTATTCC AGCAGTGTAC CAGCGAACTA CTCTGACCCA AAGTTTGCTO TAGCTGTTTO 
6101 TAACAACTAT CTCCATGAGA ATTACCCGAC GGTAGCATCT TATCAGATCA CCGACGAGTA CGATGCTTAC TTGGATATGG TAGACGGGAC AGTCCCTTGC 
6201 CTAGATACTG CAA CTTTTTO CCCCCCCAAG CTTAGAAGTT ACCCGAAAAG ACACGAGTAT AGAGCCCCAA ACATCCGCAG TGCCGTTCCA TCAGCGATGC 

6301 AGAACACGTT ccaaaacgtg ctcattgccg cgactaaaag aaactgcaac gtcacacaaa tgcgtgaact gccaacactg gactcagcga cattcaacgt 

6401 TGAATGCTTT CGAAAATATG CATCCAATGA CGAGTATTGG GAGGAGTTTG CCCGAAAGCC AATTAGGATC ACTA CTG ACT TCGTTACCGC ATACGTGGCC 
6501 AGACTGAAAG GCCCTAAGGC CGCCGCACTG TTCCCAAAGA CGCATAATTT GGTCCCATTG CAAGAAGTGC CTATGGATAG ATTCGTCATO GACATGAAAA 
6601 CAGACGTGAA AGTTACACCT GGCACGAAAC ACACAGAAGA AAGACCGAAA GTACAAGTGA TACAAGCCGC AGAACCCCTG GCGACCGCTT ACCTATGCGG 
6701 GATCCACCGG GAGTTAGTGC GCAGGCTTAC AGCCGTTTTG CTACCCAACA TTCACACGCT CTTTGACATG TCGGCGCAGG ACTTTGATGC AATCATAGCA 
6801 GAACACTTCA AGCAAGGTGA CCCGGTACTG GAGACGGATA TCCCCTCGTT CGACAAAAGC CAAGACGACG CTATCGCGTT AACCGGCCTG ATGATCTTGG 
6901 AAGACCTGGG TGTGGACCAA CCACTACTCG ACTTGATCGA GTGCGCCTTT GGAGAAATAT CATCCACCCA TCTGCCCACG GGTACCCGTT TCAAATTCCG 
7001 GGCGATGA7G AAATCCGGAA TCTTCCTCAC GCTCTTTGTC AACACAGTTC TGAATGTCGT TATCGCCAGC AGAGTATTGG AGGAGCGGCT TAAAACGTCC 
7101 AAATGTGCAG CATTTATCGG CGACGACAAC ATTATACACG GAGTAGTATC TGACAAAGAA ATGGCTGAGA GGTGTGCCAC CTGGCTCAAC ATCGAGGTTA 
7201 AGATCATTGA CGCAGTCATC GGCGAGAGAC CACCTTACTT CTGCGGTGGA TTCATCTTCC AAGATTCGGT TACCTCCACA GCGTGTCGCG TGCCGGACCC 
7301 CTTGAAAAGG CTGTTTAAGT TGGGTAAACC GCTCCCAGCC GACGATGAGC AAGACGAAGA CAGAAGACGC GCTCTGCTAG ATGAAACAAA GGCGTGGTTT 
7401 ACAGTAGGTA TAACAGACAC CTTAGCAGTG GCCGTGGCAA CTCGCTATGA GGTAGACAAC ATCACACCTC TCCTGCTGCC ATTGAGAACT TTTGCCCAGA 
7301 GCAAAAGAGC ATTTCAAGCC ATCAGAGGGG AAATAAAGCA TCTCTACGGT GGTCCTAAAT AGTCAGCATA GTACATTTCA TCTGACTAAT ACCACAACAC 
7601 CACCACCATG AATAGAGGAT TCTTTAACAT GCTCGGCCCC CGCCCCTTCC CAGCCCCCAC TGCCATGTGG AGGCCGCGGA GAAGGAGGCA GCCCGCCCCG 
7701 ATGCCTGCCC GCAATGGGCT GGCTTCCCAA ATCCAGCAAC TGACCACAGC CGTCAGTCCC CTAGTCATTG GACAGGCAAC TAGACCTCAA ACCCCACGCC 
7801 CACGCCCGCC GCCCCCCCAG AAGAAGCAGG CGCCAAACCA ACCACCGAAG CCGAAGAAAC CAAAAACACA GGAGAAGAAC AAGAAGCAAC CTGCAAAACC 
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7901 CAAACCCCGA AACAGACACC GTATGGCACT TAACTTCCAC GCCGACAGAC TGTTCGACGT CAAAAATGAG GACGGAGATG TCATCGGGCA CGCACTGGCC 
S001 ATGGAAGGAA AGGTAATGAA ACCACTCCAC CTGAAAGGAA CTATTGACCA C CCTGT CC IA TCAAACCTCA AATTCACCAA G TCGTC AGCA TACGACATCG 
8101 AGTTCCCACA GTTGCC CC TC AACATGAGAA GTCAGGCGTT CACCTACACC ACTCAACACC CTGAAGGGTT CTACAACTGG CACCACCGAG CGGTGCACTA 
8201 TAGTGGAGCC AGATTTACCA TCCCCCGCCO AGTAGGACGC AGAGGAGACA G TGGT CGTCC GATTATGGAT AACTCAGGCC CCGTTGTCGC GATAGTCCTC 
8301 GGAGGGGCTG ATGAGGGAAC AAGAACCCCC CTTTCGGTCG TCACCTGGAA TAGCAAAGGG AAGACAATCA AGACAACCCC GGAAGGGACA GAAGACTCGT 
8*01 CTGCTGCACC ACTCGTCACG GCCATGTGCT TGCTTGGAAA CGTGAGCTTC CCATGCAATC CCCCGCCCAC ATCCTACACC CGCGAACCAT CCAGAGCTCT 
8501 CGACATCCTC GAAGAGAACG TGAACCACCA GGCCTACGAC A CCC T GC T C A ACGCCATATT CCGGTGCGGA T C GT CCG GCA GAAGTAAAAG AAGCCTCACT 
8601 GACGACTTTA CCTTGACCAG CCCGTACTTG GCCACATGCT CCTACTGTCA CCATACTGAA CCGTCCTTTA CCCCCATTAA GATCGAGCAC GTCTGGGATG 
8701 AAGCGGACGA CAACACCATA CCCATACAGA CTTCCCCCCA GTTTGGATAC GACCAAAGCG GAGCAGCAAG CTCAAATAAG TACCCCTACA TGTCGCTCGA 

8801 CCAGGATCAT actctcaaag aaggcaccat cgatoacatc aagatcagca cctcagcacc gtgtagaagg cttagctaca aaggatactt tctcctcgcg 
8901 aa gtgtcc tc caggggacag cgtaacggtt agcatagcga gtagcaactc agcaacgtca tccacaatgg cccgcaagat aaaaccaaaa ttcgtgggac 
9001 gggaaaaata tgacctacct cccgttcacg gtaagaagat tccttocaca gtgtacgacc gtctcaaaga aacaaccgcc ccctacatca ctatgcacac 
9101 gccgggaccg catccctata catcctatct cgaggaatca tcagggaaag tttacgcgaa gccaccatcc gggaagaaca ttacgtacga gtgcaagtcc 

9201 GGCGATTACA AGACCGGAAC CGTTACGACC CGTACCGAAA TCACGGGCTC CACCGCCATC AAGCAGTGCG TCGCCTATAA CAGCGACCAA ACGAAGTGGG 
9301 TCTTCAACTC GCCGGACTCG ATCAGACACG CCGACCACAC GGCCCAAGGG AAAT7GCATT TGCCTTTCAA GCTGATCCCG AGTACCTGCA TGGTCCCTGT 
9401 TGCCCACGCG CCGAACGTAG TACACGCC7T TAAACACATC AGCCTCCAAT TAGACACAGA CCATCTGACA TTGCTCACCA CCAGGAGACT AGGGGCAAAC 
9501 CCGGAACCAA CCACTGAATG GATCATCCCA AACACGGTTA GAAACTTCAC CGTCGACCGA GATCGCCTCG AATACATATG GGGGAATCAC GAACCAGTAA 
9601 GGGTCTATGC CCAAGACTCT GCACCAGGAG ACCCTCACGG ATGCCCACAC GAAATAGTAC ACCATTACTA TCATCCCCAT CCTGTGTACA CCATCTTAGC 
9701 CGTCGCATCA GCTGCTGTGG CGATGATGAT TGGCGTAACT GTTGCAGCAT TATGTGCCTG TAAAGCGCGC CGTGAGTCCC TGACGCCATA TGCCCTGGCC 
9801 CCAAATGCCC TGATTCCAAC TTCGCTGGCA CTTTTGTGCT GTGTTAGGTC GGCTAATGCT GAAACATTCA CCGAGACCAT GAGTTACTTA TGGTCGAACA 
' 9901 GCCAGCCGTT CTTCTGGGTC CACCTGTGTA TACCTCTCCC CGCTGTCGTC GTTCTAATGC GCTGTTGCTC ATGCTGCCTG CCTTTTTTAG TGGTTGCCGO 
10001 CGCCTACCTG CCGAACGTAG ACGCCTACGA ACATGCGACC ACTGTTCCAA ATGTGCCACA GATACCGTAT AAGGCACTTG TTGAAAGGGC AGCGTACCCC 
10101 CCGCTCAATT TGGAGATTAC TGTCATGTCC TCGGACGTTT TGCCTTCCAC CAACCAAGAO TACATTACCT GCAAATTCAC CACTGTCCTC CCCTCCCCTA 
10201 AAGTCAGATG CTGCCGCTCC TTGGAATGTC AGCCCCCCGC TCACCCAGAC TATACCTGCA ACGTCTTTGG AGGGGTGTAC CCCTTCATGT GGGGAGGAGC 
10301 ACAATGTTTT TGCGACAGTG AGAACAGCCA GATGAGTGAG CCCTACGTCG AATTGTCAGT AGATTGCGCG ACTGACCACG CGCAGGCGAT TAAGGTGCAT 
10401 ACTCCCGCGA TGAAAGTAGG ACTCCGTATA GTGTACGGGA ACACTACCAG TTTCCTAGAT GTGTACGTGA ACCGAGTCAC ACCAGGAACG TCTAAAGACC 
10501 TGAAAGTCAT AGCTGGACCA ATTTCAGCAT TGTTTACACC ATTCGATCAC AAGGTCGTTA TCAATCGCGG CCTGGTGTAC AACTATGACT TTCCGGAATA 
10601 CGGAGCGATG AAACCAGGAG C O 1 1 IC G AG A CATTCAAGCT ACCTCCTTGA CTAGCAAAGA CCTCATCGCC AGCACAGACA TTAGGCTACT CAAGCCTTCC 
10701 GCCAAGAACG TGCATGTCCC GTACACGCAG CCCGCATCTC GATTCGAGAT GTGGAAAAAC AACTCAGGCC GCCCACTGCA CGAAACCGCC CCTTTTGGGT 
10801 GCAAGATTGC AGTCAATCCG CTTCGAGCGG TGGACTCCTC ATACGGGAAC ATTCCCATTT CTATTGACAT CCCGAACGCT GCCTTTATCA GGACATCAGA 
10901 TCCACCACTG GTCTCAACAG TCAAATGTGA TGTCAGTGAG TCCACTTATT CAGCGGACTT CGGAGCGATG GCTACCCTGC AGTATGTATC CGACCGCCAA 
UOOt GGACAATGCC CTGTACATTC GCATTCGAGC ACAGCAACCC TCCAAGAGTC GACAGTTCAT GTCCTGGAGA AAGGAGCGGT GACAGTACAC TTCAGCACCG 
1 1101 CCAGCCCACA GGCGAACTTC ATTGTATCGC TGTGTGGTAA GAAGACAACA TCCAATGCAG AATGCAAACC ACCAGCTGAT CATATCGTGA GCACCCCGCA 
11201 CAAAAATGAC CAAGAATTCC AACCCCCCAT CTCAAAAACt TCATCGAGTT GGCTGTTTGC CCTTTTCCCC GGCGCCTCGT CCCTATTAAT TATAGGACTT 
11301 ATGATTTTTG CTTGCAGCAT GATCCTGACT AGCACACGAA CATGACCCCT ACGCCCCAAT GACCCCACCA GCAAAACTCG ATGTACTTCC GAGGAACTCA 
1 1401 TCTGCATAAT GCATCAGGCT GGTATATTAG ATCCCCGCTT ACCCCGGGCA ATATAGCAAC ACCAAAACTC CACGTATTTC CGACGAAGCG CAGTGCATAA 
11501 TGCTGCGCAG TGTTGCCAAA TAATCACTAT ATTAACCATT TATTCAGCGG ACGCCAAAAC TCAATGTATT TCTGAGGAAG CATGGTGCAT AATCCCATCC 
1 1601 AGCGTCTOCA TAACTTTTTA TTATTTCTTT TATTAATCAA CAAAATTTTG TTTTTAACAT TTC 
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1 MEXPWNVDV DPQSPFWQL QX5FPQFEW AQQVTPNDHA NARAFSHLAS XUELEVPTT ATILDIGSAP ARRMFSEHQY HCVCPMRSPE DPDRMMXYAS 

101 KLAEXACXTT NKNLHEXDCD LRTVU3TPDA ETFSLCFHND VTCNTRAEY5 VMQDVYTNAP GTTYHQAMXG VRTLYWIGFD TTQFMF5AMA GSYPAYNTNW 

201 AOEXVLEARN IGU3TKLSE CRTGXUIMR KXEUW3RV YFSVGSTLYP EHRASLQSWH LPSVFHUCGX QSYTCRCBTV VSCEGYWKX nttFCJ I CE 

301 TVOYAVTNNS ECFU£XVn> TVKGERVSFP VCTYIPAT1C DQMTdMATO BPDDAQXU VCLNQWVIN GXTNRNTNTM QNYLLPOAQ GFSXWAJCERX 

401 EDLDNEKMLG YRERKLYYGC LWAFRTXXVH SFYRPPGTQT IVKVPASFSA FPMSSVWTTS LPMSLRQKMK LALQPXKEEX LLQVPEELVM EAXAAFEDAQ 

501 fFffiK f"! ALPPLVADXG tEAAAEWCE VEGLQAOTCA ALVETPRGKV RHPQANDRM K3QYIWSP1 SVUCMAKLAP AHPLADQVXI 1THSGRSGRY 

601 AVEPYDAXVL MPACSAVPWP EFLALSESAT LVYNEREFVN RXLYWAMHG PAXNTEEEQY KVTXAEUET EYVFBVDXXR CVXXEEASGL VLSGELTKPP 

701 YHELALEGLX TRPAVPYKVE TIGVIGT7GS GXSAUXSTV TAROLVTSGK KENCREIEAD VIRLRGMQIT SXTVDSVMLN GCHXAVEVLY VDEAFRCHAG 

801 ALLAUATVR FRKXWLCGD PXQCGFFNMM QUCVHFHHPE KDICTXTFnC FISRRCTQPV TATVSTtHYD GXMKTTNPCX KNIEtDTTGA TKPOODIIL 

SOI TCFRCWVKQL QIDYPGKEVM TAAASQGLTR XGVYAVRQXV NENFIYATTS EHVNVLLTRT EDRLVWKTLQ GDPWIKQLTN VPKGKFQATI EOWEAEHXG! 

1001 IAAINSPAPR TNPFSCXTNV CWAJCALEPOL ATAG1VLTCC QWSELFFQFA O0K7HSAIYA LOV1CDCFFG MDLTSGLFSX QS1PLTYHPA DSARPVAHWD 

1101 NSPGTRXYGY DHAVAAEISR RFPVFQLAGK GTQLDLQTCT TRVtSAQKNL VPVNRNLPHA LVPEHREXQ? CPVEKFLSQF KHHSVLVBE KXIEAPKKR1 

1201 EWIAPIG1AG ADKKYNlAfG FPPOARYDLV F1NIGTXYRN HHFQQCEOHA ATLXTLSRSA LNCLN7GGTL WXSYGYADR KSE0WTALA RKFYRVSAAR 

1301 PECYS3NTEM YUFRQLDNS RTRQFTTHHL NCVBSVYEO TRDGV GAAPS YRTXRENlAD CQEEAWHAA NPLGRPGEGV CRAIYKRWPN SFTDSATETO 

1401 TAKLTVCQGX XVIHAVGPDF RKHPEAEALX LLQNAYHAVA DLVNEKNDCS VAIPLLSTC1 YAAGXDRLEY SLNCLTTALD RTDADVTTYC LDKJCWKERID 

1501 AVLQUCE5VT EUCOE0ME2O DELVWIKPDS CLXGRXGFST TKGKLYSYFE GTXFHQAAXO MAEOCVLFFN DQESNEQLCA YILGETMEAI REXCPVDHNP 

1601 SSSPFXTLPC LCMYAMTPER VHRLRSNNVX EVTVCSOTL PXYXDWVQK VQCTKVYLFN FKTPAFVPAR KYIEAPEQPA AFPAQAEEA? GVVATPTPPA 

1701 ADNT5LDVTD BLDMED53E C5LFSSFSGS DNYRRQWVA DVHAVQEPAP VPPPRUCXMA RLAAARMQEE FTPPASTSSA DESIHL5FDG VStSFGSLFD 

1801 CEMARLAAAQ PPASTCPTDV PMSFGSFSDO EIEELSRRVT ESEPVLFGSF EPGEVNSIIS SRSAVSFPPR KQRRRRRSRR TEYCLTGVGG YTFSTUTGPG 

1901 KLQKKSVLQN QLTEPTLERN VLERIYAPVL DTSXEEQLXL RYQMMPTEAN XSRYQSRXVE NQXAJTTERL LSGUU.YKSA TDQPECYKTT YFKPSYSSSV 

2001 PANYSDPKFA VAVCNNYIHE NYPTVASYQI TDEYDAYLDM VOGTVACLDT ATFCPAKLRS YPKRKEYRAP N1RSAVPSAM QKTLQNVUA ATKRNCNVTQ 

2101 MREUTUJSA TPNVECFRKY ACNDEYWEEF ARKP1RITTE FVTAYVARLX GPXAAALFAX THNLVPLQEV PMDRFYMDMK RDVXVTPGTX HTEERFXVQV 

2201 IQAAEPLATA YLCGIHRELV RRLTAVLLPN 1HTLFDMSAE DFDAUAEHF KQGOPVLETO IASFDKSQDO AMALTCLMIL EOLGVDQPLL DLIECAFCE! 

2301 SSTHLFTCTR FXFGAMMXS0 MFLTLFVNTV LNW1ASRVL EER1XTSKCA AF1GDDK1IH GWSOKEMAE RCATWLNMEV XIIDAYIGER PPYFCGGFIL 

2401 QDSVTSTACR VADPUCRLFK LGXPLPADDE QDEDRRRALL DETXAWFRVG ITDT1AVAVA TRYEVDNTTP VLLALRTPAQ SXRAFQAIRG EDCHLYGCPK 



Amino Acid Sequence of the Structural Polyprotein 



I MNRGFFNMLO RRPFPAPTAM WRPRRRRQAA PMPARNGLAS QIQQLTTAVS ALVICOATRF QTPRPRPPPR QXXQAPXQPP XPXKPXTQEX XXXQPAXPXP 

101 GXRQRMALKL EAORLFDVKN EOGOVIGHAL AMEGXVMXPL HVKGTIOHPV LSKUCFTX5S AYDMEFAQLP VNMRSEAFTY TSEHPEGFYN WHHGAVQYSG 

201 GRFTTFRGVG GRGDSGRPtM DNSGRWAIV LCGADEGTRT ALSVVTWNSX GKTDCTTFEG TEEWSAAPLV TAMC1XGNVS FPCNRPFTCY TREPSRALDI 

301 LEENVNHEAY DTU-NAILRC GSSGR5XRSV TOOFTLTSPY LGTCSYCHHT EPCFSPK1E QVWDEADONT IRIQTSAQFO YDQSGAASSN KYRYMSLEQD 

401 KTVKEGYMDD KBTSCPCR RLSYKGYFLL AKCPPGDSVT VSIASSNSAT SCTMARXDCP KFVGREXYDL PPVHGKXIPC TVYDRUCETT AGYTTMHRPG 

501 PHAYTSYLEE SSGKVYAXPP SGKNTTYECX CGDYKTOTVT TRTETTCCTA OCQCVAYXSD GTXWVFNSPD SWHAOHTAQ GXLHLPFKU PSTCMVPVAH 

601 APNWKGFKH I5LQL0TDHL TLLTTRRLGA NPEPTTEWW GNTVRNFTVD RDGLEYTWGN HEPVRVYAQE SAPGOPHGWP HE1VQHYYHR HPVYTTLAVA 

701 SAAVAMMIGV TVAALCACKA RRECLTPYAL APNAVIPTSL AtLCCVRSAH AETFTETMSY LWSNSQPFFW VQLCTPUAAV WLMRCCSCC LPFLWAGAY 

801 LAKVDAYEHA TTVPNVPQIF YKALVERAGY APLNLETTVM SSEVLPSTNQ EYTTCKFTTV VPSPKVRCCG SLECQPAAHA DYTCXVFCCV YPFMWGGAQC 

901 FCDSENSQMS EAYVELSVDC ATDHAQADCV HTAAMXVGLR IYYGKTTSFL DVYVNGVTPG TSXDLXYIAG PISAUTPFD HXW1NRGLV YNYOFPEYGA 

1001 MRPGAFCDIQ ATSXTSXDU ASTDIRLLXP SAXNVHVPYT QAA5GFEMWK NN5GRPLQET APFGCXIAVN PLRAVDCSYG NTP1SIDIPN AAFIRTTOAP 

1 101 LVSTVKCDVS ECTYSADFGG MATLQYVSDR ECQCPVHSHS STATLQE5TV KVLEXGAVTV HFSTASPQAN FIVSLCGXKT TCNAECKPPA OHIVSTPHXH 

1201 DQEFQAA1SX TSWSWLFALF GGAS3LU1G LMIFACSMML TSTRR 
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I WITCNCGGCG TAGTATACAC TATTCAATCA AACACCCCAC CAATTGCACT ACCATCACAA TCCACAACCC ACTACTTAAC CTACACCTAC ACCCGCAOAG 

ioi rcccnrcTC ctccaactcc aaaagagctt cccccaattt cacctactag cacagcaggt cactccaaat caccatccta atgccagagc attttcccat 

201 CTCCCCACTA AACTAATCCA CCTCCACCTT CCTACCACAC CCACCATTTT CCACATACCC ACCCCACCCC CTCCTAGAAT CTTTTOCCAG CACCAOTACC 
301 ATTGCCTTTC CCCCATCCCT ACTCCACAAC ACCCCCACCC CATGATCAAA TATCCCACCA AACTCCCCCA AAAACCATCC AACATTACCA ATAAGAACTT 
401 CCATCACAAO ATCAAGGACC T CCCG ACCGT ACTTCATACA CCCCATCCTC AAACCCCATC ACTCTGCTTC CACAACCATC TTACCTGCAA CACCCCTOCC 
501 CAGTACTCCC TCATCCACCA CCTCTACATC AA CGCT CC CG CAACTATTTA CCATCACGCT ATGAAACCCG TCCCGACCCT GTACTGCATT CCCTTCCATA 
601 CCACCCACTT CATCTTCTCG CCTATCCCAC CTTCCTACCC TCCCTACAAC ACCAACTCCC CCCACGAAAA ACTCCTCCAA CCCOCTAACA TCCCACTCTG 
701 CACCACAAAC CTCACTCAAC CCACGACACC AAAGTTCTCG ataatgagga ACAACCACTT caaccccccc tcaccccttt atttctccct tcoatcoaca 
201 CTTTACCCAG AACACAGAGC CAGCTTGCAG AGCTCGCATC TTCCATCGGT GTTCCACCTG AAAGGAAAGC A G T CG T A CAC TTGOCGCTGT GATACAGTGG 
901 TGAGCTGCGA AGGCTACGTA GTGAAGAAAA TCACCATCAG TCCCGGGATC ACGGGAGAAA CCGTGGGATA CGCGGTTACA AACAATAGCG AGGGCTTCTT 
1001 GCTATGCAAA GTTACCCATA CAGTAAAAGG AGAACCGGTA TCGTTCCCCG TGTGCACGTA TATCCCGGCC ACCATATGCG ATCAGATGAC CGGCATAATG 
1101 GCCACGGATA tctcacctga cgatccacaa aaacttctgg TTGCCCTCAA CCAGCGAATC GTCATTAACG GTAAGACTAA CAGGAACACC aataccatcc 
1201 AAAATTACCT TCTGCCAATC ATTGCACAAG GGTTCAGCAA ATGGGCCAAG GAGCGCAAAG AAGACCTFGA CAATGAAAAA ATGCTGGCTA CCAGAGAGCG 
1301 CAAGCTTACA TATGGCTGCT TGTGGGCGTT TCGCACTAAG AAAGTGCACT CGTTCTATCG CCCACCTCGA ACGCAGACCA TCGTAAAAGT CCCAGCCTCT 
1401 TTTAGCGCTT TCCCCATGTC ATCCGTATGG ACTACCTCTT TGCCCATGTC GCTCAGGCAG AAGATAAAAT TGGCATTACA ACCAAAGAAO GAGGAAAAAC 
1501 TGCTGCAAGT CCCCGAGGAA TTAGTCATCG AGGCCAAGGC TGCTTTCGAG CATGCTCAGG AGGAATCCAG AGCGGAGAAG CTCCGAGAAG CACTCCCACC 
1601 ATTAGTGGCA GACAAAGGTA TCGAGGCAGC CGCG0AAGTT GTCTCCGAAG TGGAGGGGCT CCAGGCGGAC ATCGGAGCAG CACTCGTCOA AACCCCGCGC 
1701 GGTCATGTAA GGATAATACC ACAAGCAAAT GACCGTATGA TCGGACAGTA CATCGTTGTC TCGCCAACCT CTGTCCTGAA GAACGCTAAA CTCGCACCAG 

1801 CACACCCGCT AGCAGACCAG gttaagatca taacgcactc cggaagatca ggaaggtatg cagtcgaacc atacgacgct aaagtactga tgccagcagg 
1901 AAGTGCCGTA ccatggccag aattcttacc actgagtgag agcgccacgc tagtctacaa cgaaagagag tttgtgaacc gcaagctgta ccatattgcc 

2001 ATGCACGGTC CCGCTAAGAA TACAGAAGAG GAGCAGTACA AGGTTACAAA GGCAGAGCTC GCAGAAACAG ACTACCTCTT TGACGTGGAC AAGAAGCGAT 
2101 GCGTCAAGAA GGAAGAAGCC TCAGGACTTG TCCTCTCGGO AGAACTGACC AACCCGCCCT ATCACGAACT AGCTCTTGAO GGACTGAAGA CTCGAOOCGT 
2201 GGTCCCGTAC AAGGTTGAAA CAATAGGAGT GATAGGCCCA CCAGGATCCG GCAAGTCGGC TATCATCAAG TCAACTGTCA CGGCACGTGA TCTTGTTACC 
2301 AGCGGAAAGA AAGAAAACTG CCGCGAAATT CAGGCCGATG TCCTACGGCT GAGGGGCATG CAGATCACGT CGAAGACAGT GGATTCGGTT ATGCTCAACG 
:40I GATGCCGCAA AGCCGTAGAA GTGCTGTATG TTGACGAAGC GTTCGCGTGC CACGCAGGAG CACTACTTGC CTTGATTGCA ATCGTCAGAC CCCGTCATAA 
2501 GCTAGTGCTA TGCGGAGACC CTAAGCAATG CGGATTCTTC AACATGATCC AACTAAAGGT ATATTTCAAC CACCCGGAAA AAGACATATO TACCAAGACA 
:ai TTCTACAAGT ttatctcccg acgttgcaca CAGCCAGTCA CGGCTATTGT ATCGACACTG CATTACGATG GAAAAATGAA AACCACAAAC CCGTGCAAGA 
2701 AGAACATCGA AATCGACATT ACACCGGCCA CGAAGCCGAA GCCAGGGGAC atcatcctga CATGCTTCCC CGGGTGGGTT aagcaactgc AAATCGACTA 
2801 tcccggacat gaggtaatga cagccccggc ctcacaaggo ctaaccagaa aaggagtata tgccgtccgo CAAAAAGTCA ATGAAAACCC GCTCTACGCG 

2901 ATCACATCAG AGCATGTGAA CGT G C TGC TC ACCCGCACTG AGGACACGCT ACT AT CO AAA ACTTTACAGG GCGACCCATG GATTAAGCAG CTCACTAACG 
3001 TACCAAAACO AAATTTTCAA GCCACCATCG AGGACTGGGA AGCTGAACAC AAGGGAATAA TTGCTGCGAT AAACAGTCCC GCTCCCCGTA CCAATCCOTT 
3101 CAGCTGCAAG ACTAACGTTT CCTGGGCGAA ACCACTGOAA CCGATACTGG CCACGCCCGG TATCGTACTT ACCGGTTCCC AGTGGAGCCA GCTGTTCCCA 
3201 CAGTTTGCAG ATGACAAACC ACACTCGGCC ATCTACGCCC TGGACGTAAT CTCCATTAAG TTTTTCCGCA TGGACTTGAC AAGCCGACTG TTTTCCAAAC 
3301 AGAGCATCCC GTTAACGTAC CATCCTGCC G ATTCAGCGAG GCCAGTAGCT CATTGGGACA ACAGCCCAGG AACCCGCAAG TATGCGTACG ATCACGCCGT 
3401 TGCCGCCCAA CTCTCCCGTA GATTTCCGGT GTTCCAGCTA GCTGGGAAAG GCACACAGCT TGATTTGCAG ACGCGCAOAA CTAGAGTTAT CTCCGCACAG 
3501 CATAACTTGO TCCCAGTGAA CCGCAATCTC CCCCACCCCT TAGTCCCCGA GCACAAGGAG AAACAACCCG GCCCGGTCAA AAAATTCTTG AGCCAGTTCA 
3601 AACACCACTC CGTACTTGTG GTCTCAGAGG AAAAAATTCA AGCTCCCCAC AAGAGAATCG AATGGATCGC CCCGATTGCC ATAGCCGCCG CTGATAACAA 
3701 CTACAACCTO GCTTTCGGCT TTCCGCCGCA GGCACGGTAC GACCTGGTGT TTATCAATAT TGGAACTAAA TACAGAAACC ATCACTTTCA GCAGTGCGAA 
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3501 GACCATCCCO CCACCTTCAA AACOCTCTCG CO J 1 OCCCCC TCAACTCCCT TAACCCCGGA GGCACCCTCG TCGTGAAGTC CTACCGTTAC GCCGACCGCA 
3901 ATACTCACGA CCTAGTCACC CCTCTTCCCA GAAAATTTGT CAGA CTC 7T CT CCAOCCACCC CAGA GTGCCT CTCAACCAAT ACAOAAATCT ACCTGATCTT 
4001 CCCACAACTA CACAACACCC CCACACCACA ATTCACCCCC CATCATCTCA ATTGTGTGAT VTLU1CCGTO TACCACCCTA CAAGAGACCC AGTTGGAGCC 
4101 CCACCCTCAT ACCCCACTAA AACCCAGAAC ATTGCTGATT GTCAACACGA ACCACTTCTC AATCCACCCA ATC C OCT CCO CACACCAGCC CAACGAGTCT 
<201 CCC C T C CCAT CTATAAACGT TCGCCCAACA CTTTCACCCA TTCACCCACA CAGACCCCCA CCCCAAAACT CACTCTCTCC CAACGAAAGA AAGTOATOCA 
4301 CCCCGTTGGC CCTGATTTCC GGAAACACCC AGAGCCAGAA GCCCTGAAAT TGCTGCAAAA CCCCTACCAT GCACTCGCAG ACTTAGTAAA TGAACATAAT 
4401 ATCAAGTCTG TCGCCATCCC ACTCCTATCT ACAGGCATTT ACGCAGCCCO AAAAGACCGC CTTGAACTAT CACTTAACTG CTTGACAACC GCGCTAGATA 
4501 GAACTGATGC CGACGTAACC ATCTACTCCC TGGATAACAA GTGCAAGGAA AGAATCGACG CGGTCCTCCA ACTTAAGGAG TCTGTAATAG AGCTGAAGGA 
4601 TGAGGATATG CAGATCGACG ACGACTTACT ATGGATCCAT CCGGACAGTT GCCTOAAGGG AAGAAAGGGA TTCACTACTA CAAAAGGAAA GTTGTATTCG 
4701 TACTTTGAAG GCACCAAATT CCATCAAGCA GCAAAAGATA TGGCGGAGAT AAAGGTCCTG TTCCCAAATG ACCAGGAAAG CAACGAGCAA CTGTGTCC CT 
4201 ACATATTGGG GGaGACCATG GAAGCAATCC CCGAAAAATG CCCGGICCAC CACAACCCGT CGTCTAGCCC CCCAAAAACG CTGCCCTGCC TCTGCATGTA 
4901 TGCCATCACG CCAGAAAGGG TCCACAGACT CAGAAGCAAC AACGTCAAAG AAGTTACAGT ATGCT CC TC C A CC C C CC UC CAAAGTACAA AATCAAOAAC 
5001 GTTCAGAAGG TTCAGTGCAC AAAAGTAGTC CTGTTTAACC CGCATACCCC TGCATTCGTT CCCCCCCGTA AGTACATAGA AGCGCCAGAA CAGCCTGCAG 
5101 CTCCGCCTGC ACAGCCCGAG GAGGCCCCCG AAGTTGCAGC AACACCAACA CCACCTGCAG CTGATAACAC CTCGCTTGAT GTCACGGACA TCTCACTGGA 
520! CATGGAAGAC AGTAGCGAAG GCTCACTCTT TTCGACCTTT AGCGGATCGG ACAACTCTAT TACTACTATG GACAGTTGGT CGTCAGGACC TAGTTCACTA 
5301 GAGATAGTAG ACCGAAGGCA GGTGGTGGTG GCTGACGTCC ATG CC GT C CA AGAGCCTCCC CCTGTTCCAC CCCCAAGCCT AAAGAAGATG GCCCGCCTGG 
5401 CAGCGGCAAC AATGCAGGAA GAGCCAACTC CACCGGCAAG CACCAGCTCT CCGCACCAGT CCCTTCACCT TTCTTTTCGT GGGGTATCCA TGTCCTTCGG 
5501 ATCCCmTC GACGGAGAGA TGGGCGCCTT GGCAGCCGCA CAACCCCCGG CAAGTACATG CCCTACGGAT GTGCCTATGT CTTTCGGATC GTTTTCCGAC 
5601 GGAGAGATTG AGGAGCTGAG CCGCAGAGTA ACCGAGTCTG AGCCCCTCCT GTTTGGGTCA TTTGAACCGG GCGAAGTGAA CTCAATTATA TCGTCCCGAT 
5701 CA GTTGTA TC TTTTCCACCA CGCAAGCAGA GACGTAGACG CAGGAGCAGG AGGACCGAAT ACTGACTAAC CGGGGTAGGT GGGTACATAT T7TCGACGGA 
5801 CACAGGCCCT GGGCACTTGC AAATGGAGTC CGTTCTGCAG AATCAGCTTA CAGAACCGAC CTTGGACCGC AATGTTCTGG AAAGAATCTA CGCCCCGGTG 
5901 CTCGACACGT CGAAAGAGCA ACAGCTCAAA CTCAGGTACC AGATGATGCC CACCGAAGCC AACAAAAGCA GGTACCAGTC TAGAAAAGTA GAAAATCAGA 
6001 AAGCCATAAC CACTGAGCGA CTGCTTTCAG GGCTACGACT GTATAACTCT GCCACAGATC AGCCAGAATG CTATAAGATC ACCTACCCGA AACCATCGTA 
6101 TTCCAGCAGT GTACCGGCGA ACTACTCTGA CCCAAAGTTT GCTGTAGCTG TTTGCAACAA CTATCTGCAT GAGAATTACC CGACGGTAGC A7XTTATCAG 
6201 ATCACCGACC AGTACGATGC TTACTTCGAT ATGGTAGACG GGACAGTCGC TTGCCTAGAT ACTGCAACTT TTTGCCCCGC CAAGCTTAGA AGTTACCCGA 
6301 AAAGACACGA GTATAGAGCC CCAAACACTC GCAGTGCGGT TCCATCAGCG ATGCAGAACA CCTTGCAAAA CGTGCTCATT GCCGCGACTA AAAGAAACTG 
6401 CAACGTCACA CAAATGCGTG AATTGCCAAC ACTGGACTCA GCGACATTCA ACGTTGAATC CTITCGAAAA TATGCATGTA ATGACCAGTA TTGGGAGGAG 
6501 TTTGCCCGAA AGCCAATTAG CATCACTACT GAGTTCGTTA CCGCATACGT GGCCAGACTG AAAGGCCCTA AGGCCGCCGC ACTGTTCGCA AAGACCCATA 
6601 AT7TGGTCCC ATTGCAAGAA GTGCCTATGG ATAGGTTCGT CATGGACATG AAAAGAGACG TGAAAGTTAC ACCTGGCACG AAACACACAG AAGAAAGACC 
6701 CAAAGTACAA GTCCTACAAG CCGCAGAACC CCTCCCGACC CCTTACCTGT GCGGGATCCA CCGCGAGTTA GTGCGCAGGC TTACAGCCCT CTTCCTACCC 
6801 AACATTCACA CGCTTTTTGA CATGTCGGCG GAGGACTTTG ATGCAATCAT ACCAGAACAC TTCAAGCAAG GTGACCCGGT ACTGGAGACG GATATCGCCT 
6901 CGTTCGACaa AAGCCAAGAC GACCCTATGG CGTTAACTGO CCTGATGATC TTGGAAGACC TGGCTGTGGA CCAACCACTA ctcgacttga tcgagtgcgc 
7001 CTTTCGAGAA ATATCATCCA CCCATCTCCC CACGCGTACC cgtttcaaat tcggggcgat gatgaaatcc ggaatgttcc tcacgctctt tgtcaacaca 
7101 gttctgaatg tcgttatcgc cagcagagta ttggaggagc ggcttaaaac gtccaaatgt CCAGCATTTA TCGGCGACGA CAACATCATA CACCGAGTAC 
7201 TATCTCACAA AGAAATGGCT GAGAGGTGTG CCACCTGGCT CAACATGGAG GTTAAGATCA TTGACCCAGT CATCCGCGAG AGACCGCCTT ACTTCTGCCG 
7301 TGCATTCATC TTGCAAGATT CGCTTACCTC CACAGCGTCT CGCGTGGCGG ACCCCTTCAA AAGGCTGTTT AAGTTGGGTA AACCGCTCCC AGCCGACGAC 
7401 GAGCAACACG AAGACAGAAG ACGCGCTCTG CTAGATGAAA CAAAGCCGTO GTTTAGAGTA GGTATAACAG ACACCTTAGC AGTGGCCGTG CCAACTCCGT 
7501 ATGAGGTAGA CAACATCACA CCTGTCCTGC TGGCATTGAG AACTTTTCCC CACAGCAAAA GAGCATTTCA AGCCATCAGA GGGGAAATAA AGCATCTCTA 
7601 CGGTGGTCCT aaatagtcag CATAGCACAT TTCATCTGAC TAATACCACA ACACCACCAC CATGAATAGA GGATTCTTTA ACATCCTCGG CCCCCGCCCC 
7701 TTCCCGGCCC ccactgccat gtggaggccg cggagaagga ggcaggcggc cccgatgcct GCCCGCAATG GCCTGGCTTC CCAAATCCAG CAACTGACCA 
7801 cagccgtcag tcccctagtc attgcacagg caactagacc tcaaacccca cgcccacgcc cgcccccccg ccagaagaag cacgcgccaa agcaaccacc 
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7931 GAAGCCGAAG aaaccaaaaa cacaggagaa gaagaagaag caacctgcaa aaoocaaacc cggaaxgaga caacctatcc cactcaagtt ccacccccac 

8001 AGACTGTTCG ACCTCAAAAA TCACGACCCA CATCTCATCC GGCACGCACT CCCCATCCAA GGAAAGGTAA TGAAACCACT CCACGTGAAA GGAACTATTG 
8101 ACCACCCTCT GCTATCAAAG CTCAAATTCA CCAACTCCTC ACCATACCAC ATCCACTTCC CACAGTTGCC CGTCAACATG AGAAGTCAGC CCTTCACCTA 
8201 CACCACCGAA CACCCTGAAG CCTTTTACAA CTCCCACCAC CCACCCGTCC ACTATACTCC ACCTACATTT ACC A TCCC CC CCCCAOTACC AGCCAOACCA 
8301 GACACTCGTC GTCCCATTAT CCATAACTCA GGCCGGGTTG TCCCCATAGT CCTCCGACCG GCTGATGAGG CAACAAGAAC TGCCCTTTCG GTCGTCACCT 
8401 GGAATAGCAA AGGGAAGACA ATCAAGACAA CCCCGGAAGG GACAGAAGAO TGGTCTCCAG CACCACTGGT CACGGCCATG TG CTTGCTTC GAAACGTGAG 
8501 CTTCCCATGC AATCGCCCGC CCACATGCTA CACCCGCGAA CCATCCAGAG CTCTTGACAT CCTTGAAGAG AACGTGAACC ACGAGGCCTA CGACACCCTG 
8601 CTCAACCCCA TATTCCGGTG CGGATCGTCC GGCAGAAGCA AAAGAAGCGT CACTGACGAC TTTACCTTGA CCAGCCCGTA CTTGGGCACA T GC T CC T A CT 
8701 GTCACCATAC TGAACCGTGC TTTAGCCCGA TTAACATCGA CCAGGTCTCG GATGAAGCGG ACGACAACAC CATACCCATA CAGACTTCCC COCAGTTTGG 
8801 ATACCACCAA AGCGGAGCAG CAAGCTCAAA TAAGTACCGC TACATGTCGC TCGAGCAGGA TCATACCCTC AAAGAAGGCA CTATGGATGA CATCAAGATC 
8901 AGCACCTCAC GACCGTGTAG AAGGC1TAGC TACAAAGGAT ACTTTCTCCT CGCGAAGTGT CCTCCAGGGG ACAGCGTAAC GGTTAGTATA GCGAGTAGCA 
9001 ACTCAGCAAC GTCATGCACA ATGGCCCGCA AGATAAAACC AAAATTCGTG GGACGGGAAA AATATGACCT ACCTCCCGTT CACGGTAAGA AGATTCCTTG 
9101 CACAGTGTAC CA CC CTCT G A AAGAAACAAC CGCCGCCTAC ATCACTATGC ACAGGCCGGG ACCGCACGCC TATACCTCCT ATCTCGAGGA ATCATCAGGG 
9201 AAAGTCTACG CGAACCCACC ATCCGGAAAG AACATTACGT ACGAGTGCAA GTCCGGCGAT TACAAGACCG GTACCGTTAC GACCCGTACC GAAATCACGG 
9301 GCTCCACCGC CATCAAGCAG TGCGTCCCCT ATAAGAGCGA CCAAACGAAG TGGOTCTTCA ATTCGCCGGA CTTGATCAGA CATCCCGACC ACACGCCCCA 
9401 AGGGAAATTG CATTTACCTT TCAAGCTGAT CCCGAGTACC TGCATGGTCC CTCTTCCCCA CCCGCCGAAC GTAGTACACG GCTTTAAACA CATCAGCCTC 
9501 CAATTAGACA CAGACCACCT GACATTGCTC ACCACCAGGA GACTAGCGGC AAATCCGGAA CCAACTACTG AATGGATCAT CGGAAAGACG GTTAGAAACT 
9601 TCACCGTCCA CCCAGATGGC CTGGAATACA TATGGGGCAA TCACGAACCG GTAACGGTCT ATGCCCAAGA GTCTGCACCA GGAGACCCTC ACGGATGGCC 
9701 ACACGAAATA GTACAGCATT ACTACCATCG CCATCCTGTG TACACCATCT TAGCCGTCGC ATCAGCTGCT GTGGCGATGA TGATTCGCGT AACTGTTGCA 
9801 GCATTATGTG CCTGTAAACC CCCCCGTGAG TGCCTCACGC CATATCCCCT GGCCCCAAAT GCCGTGATTC CAACTTCGCT GGCACT7TTC TCCTGTGTTA 
9901 GGTCGGCTAA TCCTGAAACA TTCACCGAGA CCATGAGTTA CCTATCGTCG AACAGCCAGC CATTCTTCTG GGTCCAGCTG TGTATACCCC TGGCCGCTGT 
10001 CATCCTTCTA ATGCGCTGTT CCTCATGCTG CCTGCCTTTT TTAGTGGTTG CCGGCCCCTA CCTGGCGAAG GTAGACGCCT ACCAACATGC GACCACTGTT 
10101 CCAAATGTGC CACAGATACC GTATAAGGCA CTTGTTGAAA GGGCAGGGTA CGCCCCGCTC AATTTGGAGA TTACTGTCAT GTCCTCGGAG G7TTTGCC1 1 ' 
10201 CCACCAACCA AGAGTACATC ACCTGCAAAT TCACCACTGT GGTCCCCTCC CCTAAAGTCA AATGCTGCGG CT C CTT G CAA TGTCAGCCCG CCGCTCACGC 
10301 AGACTATACC TGCAAGGTCT TTGGAGGGGT GTACCCCTTC ATGTGGGG AG GACCACAATG TTTTTGCGAC AGTQAGAACA GCCAGATGAG TGAGGCGTAC 
10401 GTCGAATTCT CAGCAGATTG CGCGACTGAC CACGCGCAGG CGATTAAGCT CCATACTCCC GCGATGAAAG TAGCACTACG TATAGTGTAC GGGAACACTA 
10501 CCAGnTCCT AGATGTGTAC GTGAACGGAG TCACACCAGG AACGTCTAAA GACCTGAAAG TCATAGCTGG ACCAATTTCA GCATCGTTTA CACCATTCGA 
10601 TCACAAGGTC GTT ATCCATC CCCGCCTGGT GTACAACTAT GACTTCCCGG AATACGGAGC GATCAAACCA GGAGCGTTTG GAGACATTCA AGCTACCTCC 
1O701 TTGACTAGCA AAGATCTCAT CCCCACCACA GACATTAGAC TACTCAAGCC TTCCGCCAAO AACGTGCATO TCCCOTACAC GCAGGCCGCA TCTGGATTCG 
10801 AGATGTCGAA AAACAACTCA GCCCGCCCAC TGCAGGAAAC CGCCCCTTTC GGGTGCAAGA TTGCAGTCAA TCCGCTTCCA GCGGTGGACT GCTCATACGG 
10901 GAACATTCCC ATCTCTATCG ACATCCCGAA CGCTCCCTTT ATCAGGACAT CAGATGCACC ACTGGT C TCA ACAGTCAAAT GTCATGTCAG TGAGTGCACT 
11001 TACTCAGCGG ACTTCGGCGG GATGGCTACC CTGCAGTATC TATCCGACCG CGAAGGACAA TGCCCTGTAC ATTCGCATTC GAGCACAGCA ACCCTCCAAG 
1 1 101 AGTCGACAGT TCATGTCCTG GAGAAAGCAG CGGTGACACT ACACTTCAGC ACCGCGAGCC CACAGGCGAA CTTTATTGTA TCGCTGTGT O GTAAGAAGAC 
11201 AACATGCAAT GCAGAATGCA AACCACCACC TOACCATATC GTGAGCACCC CGCACAAAAA TGACCAAGAA TTCCAAGCCG CCATCTCAAA AACTTCATGG 
1 1301 AGTTGGCTGT TTGCCCTTTT CGGCGGCGCC TCGTCGCTAT TAATTATAGG ACTTATGATT TTTGCTTGCA GCATGATGCT GACTACCACA CGAAGATGAC 
11401 CCCTACCCCC CAATGACCCG ACCAGCAAAA CTCGATGTAC TTCCGAGGAA CTGATGTGCA TAATGCATCA GGCTGGTATA TTAGATCCCC GCTTACCCCG 
1 1501 GGCAATATAG CAACACCAAA ACTCGACGTA TTTCCGAGGA AGCGCAGTGC ATAATGCTGC GCAGTGTTCC CAAATAATCA CTATATTAAC CATTTATTTA 
MfiOl GCGGACGCCA AAACTCAATG TATTTCTGAG GAAGCATGGT GCATAATGCC ATCCAGCGTC TGCATAACTT TTTATTATTT CTTTTATTAA TCAACAAAAT 
11701 TTTGTTTTTA ACATTTN 
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/\ # Amino Acid Sequence of the Nonstructural Polyprotein 



I MEXPWNVDV DPQSPFWQL QKSFPQFEW AQQVTPNDHA NARAFSHLAS KUELEVPTT ATILD1GSAP ARRMFSEKQY KCVCPMRSPE DPDRMMXYAS 

101 KIAEXACXTT NKNLHEXKD LRTVLDTPDA ET7SLCFHND VTCKTRAEYS VMQDVYTNAP GTTYKQAMXG VRTLYWIGFD TTQFMFSAMA GSYFAYNTNW 

201 ADHCVLEARN IGLCSTKLSE GRTGXLSEMR KKOJCPG5RV YFSVOTLYP EHRASLQSWH LPSVFKUCGK QSYTCRCDTV YSCEGYWXX rfBPgl T GE 

301 TVOYAVTNNS ECFIXCXYTD TVKGERVSFP VCTTTPATTC DQMTCftlATD K7DDAQKLL VGLNQRJYTN CKTNRKTNTW QNYLLTOAQ GFSXWAXEKX 

401 EDLDNEXMLG "TRERXLTYGC LWAFRTXXVH SFYRPPGTQT I VKVPA SFSA FPMSSVWTT5 LPMSLRQKQC LALQPXXEEX LLQV7EELVM EAXAAFEDAQ 

501 EEWAfKI.BR ALPPLVADXG EAAAEWCE VEGUJADIGA ALVETPRGHV RUPQANDRM IGQYIWSPT SVLXNAK1AP AHPIADQVKI mSGSSGRY 

601 AVEPYDAKVL MPAGSAVPWP EFLAUESAT LVYNEBEFVN RXLYK1AMHO PAXNTEEEQY KVTXAEiAET EYVFDVDXXR CVKKEFASGL YL5GEITNPP 

701 YHELAUGLK TRPWPYXVE T7GVIGAPGS GK5AIDOTV TARDLVTSGX KENOLEJQAD VLRLWjMQTT SXTYDSVMLN GOUCAVEVLY VDEAFACHAG 

SOI ALLAUAIVR PRHKWLCGD PXQCGfTNMM . QIXVYFNHPE XDICTXTFYX FBSRCTQFV TAIVSOHYD GXMKTTNPOC KNIEJDITGA TXPKPGDUL 

901 TCFRGWVXQL QIDYPGHEVM TAAASQGLTR XGVYAVRQXV NENPLYAIT3 EHVNVLLTRT EORLVWXTLQ GDPWOCQLTN VPKGNFQATI EDWEAEHKGI 

1001 IAAJNSPAPR TNPPSCKTWV CWAKXLEPIL ATAG1VLT GC QW SELFPQFA DDXPH5AIYA UWICOCFFO MOLTSGLFSX QSIPLTYHPA DSARPVAHWD 

1101 NSPGTRXYGY DHAVAAELSR RFFVFQLAGX GTQLDLGTGR TRVBAQHNL VpVNRNLPHA LVPEHXEXQP GPVXXFLSQP KHHSVLWSB EXIEAPHXRJ 

1201 EWIAPIGIAG ADKNYNLAFG FPPQARYDLV FINIGTKYRW HHFQQCBDHA ATLXTLSRSA LWCLNPGGTL WXSYGYADR NSEDWTALA RXFVRVSAAR 

1301 PECVSSNTEM YUFRQLDNS RTRQFTPHHL NCVBSVYEG TRDGYCAAPS YRTKREN1AD CQEEAWNAA NPLGRPGEGV CRAIYXRWPN SFTDSATETG 

1401 TAXLTVCQGK KVIHAVGPOF RXHPEAEALX LLQNAYKAVA OLVNEHMXS VAIPLLSTGT YAAGXORtEV SLNCLTTALD RTDADVTIYC LDKKWKEJUD 

1501 AVLQUCESV1 ELXOEDMEIO DELVWIHPDS CUCGBJCGF5T TXGXLYSYFE GTXFHQAAXO MAEOCVLFPN DQESNEQLCA YILGETMEAI REXCFVDHNP 

1601 SSSPPKTLPC LCMYAMTPER VHRLRSNNVX EVTVCSSTPL PKYX1XNVQK VQCTXWLFN PKTPAFVPAR XYtEAPEQPA APPAQAEEAP EVAATPTPPA 

1701 AONT3LDVTD ISLOMEDSSE GSIFSSFSGS DN SfTSMDS W SSGPSSLEIV DRRQVWADV HAVQEPAPVp PPRLXXMARL AAARMQEEPT PPASTSSADE 

1S01 SLHLSFGGVS MSFG5LFDGE MGALAAAQPP ASTCPTDVPM SFGSFSDGE1 EELSBAVTES EPVLFG5FEP GEYN5H3SR SWSFPPRXQ RRRRXSRRTE 

1901 Y 



3. Amino Acid Sequence of the Structural Polyprotein 

1 MNRGFFNMLG RRPFPAPTAM WRPRRRRQAA PMPARNGLAS QtQQLTTAVS ALVIGQATRP QTPRPRPPPR QKXQAPXQPP KPXKPXTQEK KXXQPAXPXP 

101 CKRQRMALKL EAORLFDVKN EDGDVTCHAL AMEGXVMXPL HVXGTTDHPV UKUCFTKSS AYDMEFAQLP VNMRSEAFTY TSEHPEGFYN WHKGAVQYSG 

201 CRFTTPRGVC GRCDSCRP1M DNSGRWAIV LCGADEGTRT ALSWTWNSK G K UK 11 P EG TEEWSAAPLV TAMCLLGNV3 PPCNRPPTCY TREPSRALD1 

301 LEENYNHEAY DTLLNAlUtC GSSGRSKRSV TDDPTLTSPY LGTCSYCHHT EPCFSP1XIE QVWOEAODNT miQTSAQFG YDQ5GAA55N XYRYMSLEQD 

401 HTVXEGTMDD DCISTSGPCR RLSYXGYFLL AXCPPCDSVT VSASSNSAT SCTMARXDCP XFVCREXYDL PPVHGXXIPC TVYDRIXETT AGYTTMHRPG 

501 PHAYTSYLEE SSGKVYAXPP SGXNTTYECK CGDYKTOTVT TRTEI7XCTA DCQCVAYXSO QTXWVFNSPD URHAOHTAQ GXLHLPFKU PSTCMVPVAH 

601 APNWHCFXH tSLQLDTDHL TLLTTRRLGA NPEPTTEWn CXTVRNFTVD ROGLEY1WGN HEPVRVYAQE SAPGDPHGWP HEIVQHYYHR HPVYTILAVA 

701 SAAVAMMIGV TVAALCACKA RRECLTPYAL APNAV1PTSL ALLCCVR5AN AETPTETMSY LWSNSQPFFW VQLCIPLAAV IYLMRCCSCC LPFLWAGAY 

801 LAXVDAYEHA TTVPNVPGIP YXALVEXAGY APLNLETTVM SSEVLPSTNQ EYITCXFTTV VPSPXVXCCG SLECQPAAHA DYTCXVFGGV YPFMWGOAQC 

901 FCDSENSQMS EAYVELSADC ATDHAQADCV KTAAMXVGLX IVYGMTRFL DVYVNGVTPG TSXDLXVUG PBASFTPFD HXW1HRGLV YNYDFPEYGA 

1001 MKPGAFGDtQ ATSLTSKDU ASTDIRLLXP SAXNVHVPYT QAASGFEMWK KNSGRPLQET APFGCX1AVN PLRAVDCSYG NIPISIDIPN AAFIRT3DAP 

1101 LV5TVXCDVS ECTYSADFGG MATLQYVSDR EGQCPVHSHS STATLQESTV HVLEXGAVTV HFSTASPQAN F7V5LCGKKT TCNAECXPPA DH1VSTPHXN 

1201 DQEFQAA1SX TSWSWLFALF GGASSLUIG LMIFACSMML TSTRR 
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Nucleotide Sequence of S55 

l AT T GGCG GCG TACTACACaC TATTGAATCa AACAOOCCaC CAATTOCACT ACCATCACaa TCCACAACCC ACTACTTAAC CTAGACCTAC accctcacag tccctttgtc GTCCaaCTGC 
til AAAAOAGCTT cccccaattt CACCTACTaG cacaccacct cactccaaat gaccatccta atoccacacc attttcccat CTGGCCAGTa aactcatcca cctccacctt cctaccacac 

241 CGACGATTTT CCACATACCC ACCCCACCCG CTCGTAGAAT CrmCCG AC CACCACTACC ATTCCCTTTC CCCCATCCGT AGTCCAGaaC ACCCCCACCC CATCATGAAA TATCCCACCA 
361 AACTGCCCGA AAAACCATUT AAGATTACAA ACAACAACTT CCATGAGAAG ATCAAOCACC TCCCCACCCT ACTTGATACA CCCCATCCTC AAACCCCATC actctccttc cacaaccatg 
41 TTACCTGCAA CA CGCC T G CC GACTACTCCC TCATGCAGGA CGTGTACaTC AACGCTCCCC CAACTATTTA CCACCACCCT ATCAAACCCC TCCCCACCCT CTACTCCATT CCCTTCGACA 
601 CCACCCACTT CATCTTCTCC CCTATGGCAC CTTCCTACCC TCCATACAAC ACCAACTCCC CCCACCAAAA ACTCCTTCAA GCGCGTAACA TCCCACTCTC CAGCaCAAAO CTCACTCAAC 

72! GCACCACACC AAACTTCTCC ataatcacca acaaccactt caaccccccg tcaccccttt atttctccct tccatccaca ctttacccac aacacacacc caccttccac ACCTCCCATC 

84! TTCCATCCCT CTTCCACTTG AAAC CA AACC AGTCGTACAC TTCCCCCTCT GATACACTCC TGACCTCCGA ACCCTACCTA GTCAAGAAAA TCACCATCAC TCCCCCCATC ACGCGAGAAA 
961 CCCTCCCATA COCGCTTACA AACAATACCC AGCCCTTCTT CCTATGCAAA CTTACCCATA CAGTAAAAGG ACAACCCCTA TCCTTCCCCC TCTCCACCTA TATCCCCCCC ACCATATCCC 
1011 ATCACATCAC CCCCATAATC CCCACCCATA TCTCACCTCA CGATCCACAA AAACTTCTCC TTCCCCTCAA CCACCCAATC CTCATTAACC CTAACACTAA CACCAACACC AATACCATCC 
1201 AAAATTACCT TCTCCCAATC ATTCCACAAC CCTTCACCAA ATCCCCCAAG CA GCC CAAAC AACATCTTCA CAATCAAAAA ATCCTCCCCA CCAGACACCC CAACCTTACA TATCCCTCCT 
1321 TCTCCCCGTT TCCCACTAAC AAACTCCACT CCTTCTATCG CCCACCTCCA ACCCACACCA TCCTAAAACT CCCACCCTCT TTTACCCCTT TCCCCATCTC atccctatcg acta u.il n 
1441 TCCCCATCTC CCTCACCCAC AACATCAAAT TCGCATTACA ACCAAAGAAC CACCAAAAAC TCCTCCAACT CCCGGA G GAA TTACTTATCC ACCCCAACCC TCCTTTCGAC CATCCTCACC 
156 1 ACCAATCCAC ACCCCACAAC CTCCCACAAC CACTCCCACC ATTACTCCCA CACAAACCTA TCCACCCACC TCCCCAACTT CTCTCCCAAC TCCACCCCCT CCACCCCCAC ACCCGAGCAG 
1611 CACTCCTCGA AACCCCCCCC CCTCATCTAA CCATAATACC TCAACCAAAT CACCCTATCA TCCCACACTA TATCCTTCTC tccCccatct CTGTCCTGAA gaaccctaaa ctcccaccac 
1801 cacaccccct accacaccac cttaacatca taacccactc cccaacatca ccaacctatc cactcgaacc ataccaccct AAACTACTCA tcccagcacc aactccccta ccatccccac 
192! aattcttacc ACTCACTCAG ACCCCCACCC ttctctacaa cgaaagacac tttctcaacc ccaacctcta ccatattccc atccaccctc cccctaacaa tacacaacac caccactaca 
2041 accttacaaa cccacacctc CCACAAACAG ACTACCTCTT tcacctccac aacaacccat cccttaacaa ggaaga a gcc tcaccacttc tcctttcgcg acaactcacc aacccgccct 
3161 atcaccaact acctcttcac ccactcaaca ctccaccccc cctcccctac aaccttcaaa caataccact catacccaca ccaccatccg ccaactcacc tatcatcaao tcaactctca 
m CCCCACCTCA tcttcttacc ACCCGAAACA AAGAAAACTG ccccgaaatt cacccccacc tcctacccct caggcccatc cagatcacct cgaacacagt ccattccctt atcctcaacg 
2*0! CATCCCACAA agccctacaa ctcctgtatg ttgacgaagc cttcccctcc cacccaccac cactacttgc cttcattgca atcctcacac cccctaacaa gctactacta tccccacacc 

252f CTAACCAATC CCCATTCTTC AACATCATCC AACTAAAGGT ACATTTCAAC CACCCTCAAA AAGACaTATG TACCAAGACA TTCTaCAACT TTATCTCCCC ACCTTCCACA CACCCACTCA 
2641 CCCCTATTCT ATCCACACTG CATTACCATG GAAAAATGAA AACCACAAAC CCCTGCAAGA AGAACATCCA AATCCACATT acaggcccca ccaaccccaa cccaccggac atcatcctca 
2761 catctttccg cccctccctt aagcaactcc aaatcgacta tcccccacat cagctaatca cacccccccc CTCACAAGCC CTAACCAGAA AAGGAGTATA tgccctcccc CAAAAACTCA 
2881 ATCAAAACCC CCTGTACGCC ATCACATCAC AGCATCTCAA cctcttcctc ACCCGCACTC AGGACAGCCT AGTATCCAAA actttacagc ccgacccatg gattaagcac ctcactaacg 
300! tacctaaagg aaattttcac cccaccatcg aggactccga agctgaacac aagggaataa TTCCTCCGAT AAACACTCCC gctcccccta CCAATCCCTT CAGCTCCAAG ACTAACCTTT 

312! CCTCCCCCAA AGCACTCGAA CCCATACTGG CCACCCCCCC TATCCTACTT ACCCCTTCCC AGTGCAGCGA G CTGTTCC CA CACTTTGCCG ATGACAAACC ACACTCCCCC ATCTACGCCT 
3241 TACACCTAAT TTCCATTAAG TTTTTCCCCA TGCaCTTCAC AACCCCCCTC TTTTCCAAAC AGACCATCCC CTTAACCTAC CATCCTCCCC ACTCACCCAG GCCACTACCT CATTCGGACA 
3361 ACACCCCAGG AACACGCAAG TATCCCTACG ATCACCCCCT TCCCCCCGAA CTCTCCCCTA GATTTCCGCT CTTCCACCTA GCTGCGAAAG CCACACAGCT TGATTTGCAO acccgcagaa 

3481 CTACAGTTAT ctctccacag CATAACTTCC tcccactgaa ccgcaatctc cctcaccccttactccccca ccacaacgac aaacaacccc cccccctcca AAAATTCTTO agccacttca 
3601 AACACCACTC CCTACTTCTO atctcagaga aaaaaattga agctccccac aagagaatcg aatgcatcgc cccgattgcc atacccgccg cagataagaa ctacaacctc cctttcccct 
3721 ttccccccca ggcaccctac cacctcctct tcatcaatat tcgaactaaa tacagaaacc atcactttca acactccgaa gaccacccgc ccaccttgaa aaccctttcc ccttcccccc 

384! TCAACTCCCT TAACCCCCCA GGCACCCTCG TGCTGAAGTC CTACGCTTAC CCCCACCCCA ATACTOAGCA CGTAGTCACC CCTCTTCCCA GAAAATTTGT CAGACTCTCT OCACCCACGC 
3961 CAGACTCCGT CTCAAGCAAT ACACAAATGT ACCTCATTTT CCGACAACTA gacaacagcc GCACACGACA attcaccccc CATCATTTCA ATTGTGTCAT TTCCTCCGTC TACCAGGCTA 
4081 CAAGAGACCG agttgcagcc gcaccctcgt ACCCTACTAA aagggagaac attcctcatt gtcaagagca AGCAGTTGTC aatgcagcca ATCCACTGGG cagaccacca gaaggagtct 
4201 cccctcccat ctataaacct tcccccaaca ctttcaccga ttcagccaca gacacaccta ccccaaaact CACTGTCTGC CAAGGAAACA aactcatcca cgcccttccc CCTCATTTCC 
4321 CGAAACACCC ACACGCAGAA CCCCTCAAAT TGCTGCAAAA CGCCTACCAT GCAGTCCCAG ACTTACTAAA TCAACATAAT atcaactctc TCGCCATCCC ACTGCTATCT acacgcattt 

**41 ACGCACCCGG aaaagaccgc cttcacgtat cacttaacto cttcacaacc gccctacaca gaactcatgc ggacgtaacc atctactccc tcgataagaa gtcgaaggaa agaatcgacg 
4561 CCCTCCTCCA acttaaggag tctctaactc agctgaagca tgagcatatg gagatcgacg acgagttagt atccatccat cccgacagtt gcctgaaggg aagaaagcca ttcagtacta 
468t CAAAACGAAA cttgtattcg tactttcaag gcaccaaatt ccatcaacca gcaaaagata tcccccagat aaaggtcctc ttcccaaatg accaggaaao caacgaacaa ctctctccct 
4801 acatattcgc cgagaccatc caaccaatcc ccgaaaaatg cccgctccac cacaacccct cctctacccc gccaaaaacg ctcccctccc TCTCTATGTA TCCCATGACC CCAGAAAGGG 
492! tccacagact CAGAAGCAAT AaCCTCAAAO AAGTTACACT ATCCTCCTCC accccccttc CAAAGTACaa aatcaagaat CTTCACAACC TTCACTCCAC AAAACTACTC CTCTTTAACC 
5041 CGCaTaCCCC CGCATTCCTT CCCCCCCCTA ACTACATAGA ACCACCaCAA CACCCTCCA6 ctccccctcc acaggccgag gaggcccccg CACTTCTAGC CACACCAACA CCACCTCCAC 
3161 CTGATAACAC ctcgcttcat ctcaccgaca tctcactgca catgcaagac'actagcgaac gctcactctt ttccagcttt accggatccg acaactaccg AAOCCACCTC CTCCTCCCTC 

5281 ACCTCCATCC CCTCCAAGAC CCTCCCCCTC TTCCACCCCC AAGCCTAAAC AACATCCCCC CCCTCCCaCC GGCAACAATC CACCAACACC CAACTCCACC CCCAAGCACC ACCTCTCCCC 
5401 ACCACTCCCT TCACCTTTCT TTTCATCCCC TATCTATATC CTTCCCATCC CTTTTCGACC CAGAGATCCC CCCCTTCCCA CCCCCACAAC CCCCCCCAaG TACATCCCCT ACCCaTCTCC 
5531 CTATCTCTTT CCCATCCTTT TCCCACCGaG ACATTCACGA CTTGACCCCC AGACTAACCC ACTCCGACCC CCTCCTCTTT CCCTCATTTG AACCCCCCCA ACTCAACTCA ATTATATCCT 
5641 CCCCATCACC CCTATCTTTT CCACCACCCA AGCaGACACC TAGACCCACC ACCAGCAGGA CCCAATACTC TCTAACCCCC CTACCTCCCT ACATATTTTC CACCGACACA GCCCCTCCCC 
5761 ACTTCCAAAA CAACTCCCTT CTCCAGaaCC aGCTTACACA ACCCACCTTC CACCGCAATC TTCTCCAAAG AATCTACCCC CCGCTCCTCC ACACCTCCAA ACAGGAACAC CTCAAACTCA 
5UI CCTACCACAT CATCCCCACC CAACCCAACA AAAGCACCTA CCACTCTCCA AAACTAGAAA aCCACAAAGC CATAACCACT CAGCCaCTCC TTTCACCGCT ACCCCTCTAT AACTCTCCCA 
6001 CACATCACCC ACAATCCTAT AACATCACCT ACCCGAAACC atcctattcc ACCAGTCTAC CACCCAACTA CTCTCACCCA AACTTTCCTC TAGCTCTTTC TAACAACTAT CTCCATCACA 
6121 ATTACCCCAC CCTACCATCT TATCAGATCA CCCaCGACTA CCATCCTTAC TTCCATATCC TAGACCCCAC ACTCCCTTCC CTAGATACTG CAACTTTTTC CCCCCCCAAG CTTACAACTT 
6241 ACCCCAAAAG ACACCAGTAT AGACCCCCAA ACATCCCCAC tccccttcca TCAGCGATCC AGAACACCTT GCAAAACCTG CTCATTCCCC CGACTAAAAG aaactccaac ctcacacaaa 

6361 tgcgtgaact gccaacactc cactcaccca cattcaacct tgaatgcttt cgaaaatatc catgcaatca ccactattcc caccactttc ccccaaagcc aattaccatc actactcact 

6*81 TCCTTACCCC ATACCTCCCC AGACTCAAAG GCCCTAAGCC CCCCCCACTC TTCCCAAACA CCCATAATTT CCTCCCATTC CAAGAACTCC CTATCCATAC ATTCCTCATC CACATCAAAA 
6601 gagacctgaa acttacacct CCCACCAAAC ACACAGAAGA AAGACCCAAA CTACAAGTCA TACAACCCCC ACAACCCCTC GCGACCGCTT ACCTATGCCC CATCCACCCG GACTTACTCC 
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6721 CCACCCTTAC AGCCGTTTTG CTACCCAACA TTCACACGCT CTTTCACATC TCGCCOCaCC aCTTTCaTCC AaTCaTAGCA GAACACTTCA aGCAACCTGA CCCGGTACTG GAGACGGATA 
66*1 TCGCCTCCTT CGaCAAAAGC CAAGaCGACG CTATCGCGTT A ACCG G C CTO ATGATCTTCC AAGACCTOGG TGTGGACCAA ccactactcg acttgatcga ctococci h CCACAAATAT 
6961 CATCCACCCA TCTGCCCACG GGTACCCCTT TCAAATTCCC CCCCATCATC AAATCCCCAA TCTTCCTCAC CCTCTTTGTC AACACAGTTC TCAATCTCCT TATCCCCACC AGAGTATTGG 

7081 ACCA G CCCCT TAAAACCTCC aaatctgcag catttatccc ccaccacaac attatacacc cactactatc tcacaaacaa atgcctcaga ggtbtgo cac ctogctcaac atccacctta 

7201 ACATCATTCA CCCACTCATC CCCCACACA C CACCTTACTT CTGCGCTGGA TTCATCTTCC AACATTCCCT TACCTCCACA OCGTCfCGCG T CGCG GACCC CTTGAAAAGG CTCTTTAAGT 
7321 TGGGTAAACC CCTCCCAGCC GACGaTGAGC AACACCAACA CaGAAGACGC CCTCIOCTAC ATSaAACAAA CCCCTCCTTT ACACTACCTA TAACAGACAC CTTAGCAGTG CCCGTGGCAA 
7441 CTCCCTATGA GGTAGACAAC ATCACACCTG TCCTGCTGGC ATTCACAACT TTTCCCCAGA CCAAAAGAGC ATTTCAACCC ATCACACCGC AAATAAACCA TCTCTACGCT CCTCCTAAAT 
7561 ACTCACCATA CTACATTTCA TCTGACTAAT ACCACAACAC CACCACCATO AATACACCAT TCTTTAACAT CCTCCCCCCC CCCCCCTTCC CACCCCCCAC TCCCATCTOG ACCCCCCCCA 
7681 CAACCACCCA CCC CCCCCC C ATCCCTCCCC CCAATCCCCT CCCTTCCCAA AT CC AGCAA C TCACCACAGC CCTCACTCCC CTAGTCATTG GACACCCAAC TACACCTCAA ^TTCAfP CC 
7S0I CACGC CCGCC CCCCCCCC AC AAGAAGCaGG CCCCAAACCA ACCa CCG A A O C CC AACAAAC CAAAAACACA CGACAACAAC AACAACCAAC CTCCAAAACC CAAACCCCCA AiVLMjiACAOC 
7721 GTaTGGCACT TAACTTCCAC CCCCACACAC tcttccacct CAAAAATCAC caccoacatc TCATCCGGCA CGCACT GCCC ATCGAACGAA ACCTAATQAA ACCACTCCAC CTCAAACCAA 
8041 CTATTCACCA CCCTCTCCTA TCAAAGCTCA AATTCACCAA gtcctcacca tacgacatcc acttcccaca cttococcic aacatgagaa ctcaggcgtt cacctacacc agtgaacacc 
1161 CTCAACCCTT CTACAACTCC caccaccgac ccctccacta tagtccaggc acatttacca TCCCCCCCGC ACTAGCACCC ACACGA C ACA CrCCTCCTCC CATTATCCAT AACTCACOCC 

CS1 CCGTTGTCCC CATACTXCTC CCAGCCGCTC ATCACCCAaC AACAACCCCC CTTTCGGTCC TCACCTCCAA TACCAAACOO AACACAATCA ACACAACCCC CCAA C GGACA GAAGAGTCGT 
6401 CTCCTGCACC A CTCCTC ACC GCCATGTGCT PCCTTGCAAA CCTCAOCTTC CCATCCAATC CCCCCCCCAC ATCCTACACC CCCGAACCAT CCACACCTCT CCACATCCTC GAAGAGAACG 
8511 TCAACCACCA CCCCTACCAC ACCCTCCTCA ACCCCATATT GCCCTCCCCA tcctccccca caactaaaac aaccgtcact CACCACTTTA ccttgaccac CCCGTACTTG CCCACATGCT 
B641 CCTACTCTCA CCATACTCAA CCCTGCTTTA CCCCCATTAA CATCGAOCAC GTCTOCGATG AAGCCCACCA CAACACCATA CGCATACAGA CTTCCCCCCA CTTTCCATAC GACCAAAOCO 
nai CACCACCAAC CTCAAATAAC TACCCCTACA TGTCGCTCCA CCACCATCAT ACTGTCAAAG AACCCACCAT ggatcacatc aacatcacca cctcaccacc gtgtagaagg cttagctaca 
88*1 AACCATACTT TCTCCTCGCG AACTCTCCTC CACCGGACAO CCTAACOGTT ACCATACCCA CTACCAACTC ACCAACCTCA TCCACAATCC CCCCCAAGAT AAAACCAAAA TTCCTCGCAC 
9001 CCCAAAAATA TCACCTACCT CCCGTTCACG CTAACAACAT TCCTTCCACA CTCTACCACC GTCTGAAAGA AACAACCCCC CCCTACATCA CTATCCACAC GCCGCGACCG CACCCCTATA 

9171 CATCCTATCT CGACCAATCA TCACCCAAAC tttaccccaa cccaccatcc ccgaagaaca ttacctacca gtgcaagtgc ccccattaca acaccccaac ccttaccacc cctacccaaa 

9241 TCACCCCCTC CACCCCCATC AAGCAGTGCG TCCCCTATAA CAGCCACCAA ACGAAGTCCC TCTTCAACTC CCCCCACTCC ATCAGACACG CCGACCACAC GGCCCAAGGG AAATTGCATT 
9361 TCCCTTTCAA CCTCATCCCG AGTACCTGCA TCGTCCCTCT TCCCCACGCa CCGAACCTAG TACACGGCTT TAAACACATC AGCCTCCAAT TAGaCaCaGA CCATCTGACA TTCCTCACCA * 

946T CCAGGAGACT AGGG C CAAAC ccggaaccaa ccactcaatg gatcatcgga aacacggtta gaaactzcac cctcgaccga gatggcctgg AATACATATG gggcaatcac GAACCAGTAA 
9601 CGGTCTATGC CCAAGAGTCT CCACCAGCAG accctcacgg atggccacac gaaatagtac agcattacta tcatcgccat cctctgtaca ccatcttagc cctcccatca gctgctgtcg 
9721 cgatgatgat tggcgtaact cttccagcat tatgtgcctg taaagcgcgc cctgagtgcc tcacgccata tcccctgccc ccaaatcccc tcattccaac ttccctcgca cttttgtcc t 

9641 GTGTTAGGTC GGCTAATGCT GAAACATTCA CCGACACCAT CACTTACTTA TGGTCGAACA GCCaGCCCTT CTTCTGGCTC CAG CTGTGTA TACCTCTGGC CGCTGTCGTC CTTCTAATCC 
9961 CCTCTTCCTC ATCCTGCCTC CCTTTTTTAG TCCTTGCCGG CCCCTACCTG GCGAAGGTAG ACGCCTACGA ACATCCGACC ACTGTTCCAA ATGTGCCACA GATACCGTAT AAGCCACTTC 
10011 TTGAAAGGCC AGGGTACGCC CCCCTCAATT TCGACATTAC TCTCATGTCC TCGGAGGTTT TCCCTTCCAC CAACCAAGaG TACATTACCT GCAAATTCAC CACTCTCGTC CCCTCCCCTA 
10201 AAGTCAGATG CTGCGGCTCC TTCGAATGTC AGCCCCCCGC TCACGCAGAC TATACCTGCA AGGTCTTTCG AGGCGTGTAC CCCTTCATGT GGGGAGGApC ACAATGTTTT TGCCACAGTG 
tOSZl AGAACAGCCA GATGAGTGAG GCGTACGTCG AA I IL IT-ACT AGATTGCGCG ACTGACCACG CCCAGCCGAT TAAGGTGCAT ACTGCCGCGA TGAAAGTAGG ACTGCGTATA CTCTACCGCA 

10441 ACACTACCAG tttcctacat gtgtacgtga acggagtcac accaggaacg tctaaagacc tcaaagtcat AGCTGGACCA atttcagcat TGTTTACACC ATTCCATCAC aacctcctta 

10561 TCAATCCCCG CCTCCTGTAC AACTATGACT TTCCGGAATA CGGAGCGATG AAA C CAGGaG CCTTTGGAGA CATTCAAGCT ACCTCCTTGA CTAGCAAAGA CCTCATCGCC A C CACAGACA 
10611 TTAGGCTACT CAAGCCTTCC CCCAACAACG TGCATCTCCC CTACACCCaC GCCGCATCTC CATTCGAGAT CTGGAAAAAC AACTCAGCCC CCCCACTGCA GGAAACCGCC ccttttggct 

10801 GCAA6ATTGC agtcaatccg cttcgagcgg tggactcctc atacggcaac attcccattt ctattgacat cccgaaccct gcctttatca ggacatcaca tccaccactg CTCTCAACAG 

10921 TCAAATGTGA TC7CAGTGAG TCCACTTATT CAGCGGACTT CGGAGCGATG GCTACCCTGC ACTATGTATC CCACCGCGAA CGACAATCCC CTGTACATTC GCATTCGAGC ACAGCAACCC 
U041 TCCAAGAGTC CaCACTTCAT CTCCTGGAGA AACGAGCCCT cacagtacac TTCAGCACCC CCACCCCACA GCCGAACTTC ATTCTATCCC TCTGTGCTAA GAAGACAACA TCCAATGCAC 
1 1 161 AATGCAAACC ACCACCTGAT CATATCGTCA CCACCCCCCA CAAAAATCAC CAAGAATTCC AAGCCGCCAT CTCAAAAACT TCATGGACTT GGCTCTTTGC CCTTTTCGGC GGCGCCTCCT 
1 1221 CGCTATTAAT TaTACGACTT ATCATTTTTC CTTCCAGCAT CATCCTGACT AGCACACCAA CATCACCCCT ACCCCCCAAT CACCCCACCA CCAAAACTCG ATGTACTTCC CACCAACTCA 
1 1*01 TCTCCATAAT CCATCACCCT GCTATATTaC aTCCCCGCTT AC C CCCCCCA ATATAGCAAC ACCAAAACTC CACCTATTTC CCAGGAACCG CAGTCCATAA TCCTGCGCAG TGTTCCCAAA 
1 1521 TAATCACTAT ATTAACCATT TATTCACCCG ACGCCAAAAC TCAATCTATT TCTCACGAAG CATCGTCCAT AATGCCATGC ACCCTCTCCA TAACTTTTTA TTATTTCTTT TATTAATCAA 
1 1641 CAAAATTTTC TTTTTAACaT TTC 
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Nucleotide Sequence of TR339 

I ATTGGCGGCG TAGTACACAC TATTCAATCA AACACCCCAC CAATTGCACT ACCATCACAA TGGAGAAGCC AGTAGTAAAC GTAGACOTAC *T*GAG ltL UlllUIC GTCCAACTCC 
121 AAAAAACCTT CCCCCAATTT CACOTACTAC CACACCACCT CACICCAAAT CACCATCCTA ATGCCAGACC ATTTTCCCAT CTGCCCAGTA AACTAATCGA GCTCCAGGTT CCTACCACAO 
2«1 CCACCATCTT CCACATAOOC ACCOCACCGC CTCGTAGAAT GTTTTCCGAG CACCAGTATC ATTCTCTCTO CCCCATCCCT ACTCCAGaaC ACCCCGACCO CATGATCAAA TATGCCAGTA 
161 AACTGCCGGA AAAAGCGTGC AAGATTACAA ACAAGAACTT GCATGAGAAG ATTAACGATX TCCGGACCGT ACTTCATACG CCGGATGCTC AAACACCATC OUUUUH CACAACCATG 
«l TTACCTGCAA catgcctgcc GAATATTCCG TCATGCAGGA CCTGTATATC AACGCTCCCG GAACTATCTA TCATCAGCCT ATGAAAGCCG tccccaccct otactggatt cccttcoaca 
601 CCACCCAGTT CATCTTCTCO GCTATCGCAO GTTCGTACCC TCCGTACAAC ACCAACTGCG CCGACOACAA AGTCCTT GA A GCCCGTAACA TCGGACTTTG CAGCACAAAO CTOAGTOAAC 
n\ OTAGGACACO AAAATTGTCG ATAATGAGGA agaaggagtt gaa gccc ggg tcgcgggttt atttctccgi aggatcgaca ctttatccac AACACAGAGC CAGCTTCCAO AGCTCGCATC 
Ml TTCCATCGGT GTTCCACTTC AATGGAAAGC AGTCGTACAC TTGCCGCTGT GATACAGTCG TGAGTTGCGA AGGCTACGTA GTGAAGAAAA TCACCATCAO TCCCGGGATC *CTO GAGAAA 
961 CCGTGGGATA CGCGGTTACA CACAATAGCG AGGGCTTCTI GCTATCCAAA GTTACTGACA CAGTAAAAGG AGAACGGGTA 1LU1UXIG TCTGCACGTA CATCCCCGCC ACCATATGCG 
IQtl ATCAGATGAC TGGTATAATG GCCACGGATA TATCACCTGA CGATGCACAA AAA CTTCTCG TTGGGCTCAA CCAGCGAATT GTCATTAACG GTAGGACTAA CAGGAACACC AACACCATCC 
1201 AAAATTACCT TCTCCCGATC ATAGCACAAG GGTTCAGCAA ATCGGCTAAG GAGCGCAAGG ATGXTCTTGA TAACGAGAAA ATCCTOO GTA CTAGAGAACG CAAGCTTACO TATCCCTCCT 
1321 TGTGGGCGTT TCGCACTAAG AAAGTACATT CCTTTTATCG CCCACCTGGA ACGCAGACCA TCGTAAAAGT CCCAGCCFCT TTTAGCGCTT TTCCCATCTC GTCCGTATCG ACGACCTCTT 
1441 TGCCCATGTC GCTGAGGCAG AAATTCAAAC TGGCATTGCA ACCAAAGAAG GAGGAAAAAC TCCTGCAGGT CTCGGAGGAA TTAGTCATGO AGGCCAAGOC TCCTTTTGAG GA1GCTCAGG 
1561 AGGA A GCCAO AGCGGAGAAG CTCCGAGAAG CACI 1LLACC ATTACTGCCA GACAAAGGCA TCGAGGCAGC CGCAGAAGTT GTCTGCGAAC TGGAGGGGCT CCAGGCGGAC ATCGGAGCAG 

1611 CATTAGTTGA aaccccgcgc ggtcacgtaa ggataatacc tcaagcaaat gaccgtatga tcgcacagta tatcgttgtc tcgccaaact ctgtcctgaa GAATGCCAAA ctcgcaccag 
1601 CCCACCCGCT AGCAGATCAG CTTAAGATCA taacacactc cggtagatca ggaaggtacg cggtcgaacc atacgacgct aaagtactca tgccaccagg aggtgccgta ccatggccag 

1921 AATTCCTAGC ACTGACTCAO AGCGCCACGT TAGTCTACAA CGAAAGAGAG TTTGTGAACC GCAAACTATA CCAC ATTGCC ATGCATGGCC CCGCCAAGAA TACAOAAGAO GAGCAGTACA 

2041 AGGTTACAAA ggcagagctt GCAGAAACAG agtacgtutt tgacgtcgac aagaagcgtt gcgttaagaa ggaagaagcc tcao gtctc g TCCTCTCGGG agaactgacc AAc e ri m.1 

2161 ATCATGAGCT AGCTCTGGAG GGACTGAAGA CCCGACCTGC GCTCCCGTAC AAGGTCGAAA CAATAGGAGT GATAGGCaCA CCGGGGTCGG GCAAGTCAGC TATTATCAAG TCAACTG TCA 
22JI CGGCACGGGA TCTTCTTACC AGCGGAAAGA AAGAAAATTG TCGCGAAATT GAGGCCGACG TGCTAAGACT GAGGGGTATG CAGATTACCT CGAAGACAGT AQATTCGOTT ATGCTCAACG 
2401 GATGCCACAA AGCCGTAGAA GTGCTGTACG TTGACGAAGC GTTCGCGTGC CACGCAGGAG CACTACTTCC CTTGATTCCT ATCGTCAGGC CCCGCAAGAA GGTAGTACTA TGCGGAGACC 
2521 CCATCCAATG CGGATTCTTC AACATGATGC AACTAAAGGT acatttcaat CACCCTGAAA AAGACATATG caccaagaca ttctacaagt atatctccco gcgttgcaca cagccagtta 
2641 CAGCTATTCT ATCGACACTG CATTACGATO GAAACATGAA AACCACGAAC CCGTGCAAGA AGAACATTGA AATCGATATT ACAGGGGCCA CAAAGCCCAA GCCAGGGGAT ATCATCCTGA 
2761 CATGTTTCCG CGGCTCGGTT AAGCAATTGC AAATCGACTA TCCCGGACAT GAAGTAATGA CAGCCGCGGC CTCACAAGGG CTAACCAGAA AAGGAGTGTA TCCCGTCCGC CAAAAAGTCA 
2111 A7GAAAACCC actctacccg atcacatcag agcatgtgaa cgtgttgctc acccgcacto AGGACAGGCT AGTGTGGAAA ACCTTCCAGG CCOACCCATO gattaagcag CTCACTAACA 
3001 tacctaaagg aaactttgag gctactatac aggaciggga agctcaacac aagggaataa t tgctgc aat aaacagcccc actccccgto ccaatccgtt cagctgcaag accaacgttt 
1121 GCICGGCGAA AGCATTGGAA ccgatactag ccacggccgo tatcgtactt accggttgcc agtcgagcga actgttccca cagtttgcgg ATGACAAACC acattcggcc atttacgcct 

3241 TACACGTAAT TTGCATTAAG TTTTTCGGCA TCGACTTCAC AAGCGGACTG TTTTCTAAAC AGAGCATCCC ACTAACGTAC CATCCCGCCG ATTCAGCGAG GCCGGTAGCT CATTGGGACA 
3361 ACAGCCCAGG AACCCGCAAO TATGGGTACG ATCACGCCAT TGCCGCCGAA CTCTCCCCTA GATTTCCCCT GTTCCAGCTA GCTCCGAAGG GCACACAACT TGATI PGCAO ACGGCGAGAA 
3411 CCACAGTTAT CTCTGCACAG CATAACCTGG TCCCGGICAA CCGCAATCTT CCTCACGCCT TAGTCCCCGA GTACAAGGAG AAGCAACCCG GCCCGGTCGA AAAATTCTTG AACCAOTTCA 
3601 AACACCACTC AGTACTTGTG GTATCAGAGG AAAAAATTCA ACCTCCCCGT AAGAGAATCG AATGGATCGC CCCGATTGGC ATAGCCGOTO CAGATAAGAA CTACAACCTG O CTTTC OGGT 
3721 TTCCGCCCCA GGCACCGTAC GACCTGGTGT TCATCAACAT TGGAACTAAA TACAGAAACC ACCACTTTCA GCAGTCCGAA GACCATGCGG CGACCTTAAA AACCCTTTCG CGTTCG GCCC 
3*1 TCAATTGCCT TAACCCAGGA GGCACCCTC G TGGTGAAGTC CTA7GGCTAC GCCOACCGCA ACAGTGAGGA CGTAGTCACC GCTCTTCCCA GAAAGTTTGT CAGGGTCTCC GCAGCOACAC 
3961 CAGATTCTGT CTCAAGCAAT ACAGAAATGT ACCTCATTTT CCGACAACTA GACAACAGCC GTACACGGCA ATTCACCCCG CACCATCTGA ATTCCC7CAT nCGTCCGTC TATCAGGGTA 
4061 CAaCAGATGG AGTTGCAGCC GCGCCGTCAT ACCGCACCAA AAGGGAGAAT ATTGC7CACT GTCAAGAGGA AGCAGTTGTC AACGCAGCCA ATCCGCTBGG TAGACCAGGC GAAGCAGTCT 

4201 GCCGTGCCAT CTATAAACCT 7GGCCGACCA GTTTTACCGA ttcagccacg gagacaggca ccgcaagaat gactgtgtgc ctaggaaaga aagtgatcca cgcggtcggc cctoatttcc 
«12l GGAAGCACCC agaagcagaa gccttoaaat tcctacaaaa cgcctaccat gcagtcgcag acttagtaaa tgaacataac atcaagtctg tcgccattcc actgctatct acaggcattt 

*U| ACGCAGCCCG AAAAGACCGC CTTCAACTAT CACTTAACTG CTTGACAACC GCGCTAGACA GAACTGACGC GGACGTAACC ATCTATTGCC TCGATAAGAA GTCGAAGGAA AGAATCGACG 
4561 CGCCACTCCA ACTTAAGGAG TCTGTAACAG AGCTGAAGGA TCAAGATATG GACATCCACG ATCAGTTAGT ATCGATCCAT CCAGACAGTT CCTTGAACGG AAGAAAGGGA TTCAGTACTA 
46»l CAAAA G CAAA ATTCTATTCG TACTTCGAAG GCACCAAATT CCATCAAGCA GCAAAAGACA TCCCGGACAT AAAG G T CCTO TTCCCTAATG ACCACGAAAG TAATCAACAA CTCTGTGCC1 ' 
4101 ACATATTCGG TCAGACCATG GAACCAATCC GCCAAAAGTG CCCGCTCCAC CATAACCCGT CGTCTAGCCC GCCCAAAACG TICCCCTGCC TTTGCATGTA TGCCATCACG CCACAAAGGG 
4921 TCCACAGACT TAGAAGCAAT AACOTCAAAG AACTTACAGT ATCCTCCTCC ACCCCCCTTC CTAAGCACAA AATTAAGAAT GTTCAGAAGG TTCAGTGCAC GAAAGTAGTC CTCTTTAATC 
J04I CGCACACTCC CGCATTCGTT CCCGCCCGTA AGTACATAGA AGTCCCAGAA CAGCCTACCG CTCCTCCTG C ACAGGCCGAG GACGCCCCCG AAGTTGTAGC GACACCGTCA CCATCTACAC 
5161 CTCATAACAC CT CCCTTG AT GTCACAGACA TCTCACTCGA TATGGATGAC AGTAGCGAAG CCTCACTTTT TTCGAGCTTT AGCGCATCGG ACAACTCTAT TACTACTATG GACACTTCCT 
5211 CGTCAGGACC TAGTTCACTA GAGATAGTAG ACCGAAGGCA GGTCGTCCTC GCTGACGTTC ATCCCGTCCA AGAG CC T GC C CCTATTCCAC CCCCAAGGCT AAAGAAGATO CCCCGCCTGG 
5401 CACCCGCAAG AAAAGAGCCC actccacccg caagcaatag ctctcagtcc ctccacctct CTTTTGGTCG GGTATCCATO TCCCTCCGAT CAATTTTCGA CGGAGAGACG GCCCCCCAGG 
5521 CAGCGGTACA ACCCCTGGCA ACAGGCCCCA CGGATCTGCC TATGTCTTTC GGATCGTTTT CCGACGGAGA CATTCATCAG CTCAGCCGCA GAGTAACTCA GTCCCAACCC GTCCTCTTTG 
J64J GATCATTTGA ACCGGGCGAA GTGAACTCAA TTATATCGTC CCGATCAGCC GTATCTTTTC CACTACGCAA GCAGAGACGT AGACGCAGGA GCACGAGGAC TCAATACTGA CTAACCCCGG 
5761 TAGGTCGGTA CATATTTTCG ACGGACACAG GCCCTCCGCA CTTGCAAAAG AAGTCCGTTC TGCAGAACCA CCTTACAGAA CCGACCTTCG AGCGCAATGT CCTGGAAAGA ATTCATCCCC 

5811 CGGTGCTCGA CACCTCGAAA cagcaacaac tcaaactcag gtaccagatg atgcccaccg aagccaacaa aagtaggtac cagtctcgta aactagaaaa tcagaaagcc ataaccactg 
6001 AGCGACTACT ctcacgacta cgactgtata actctgccac agatcagcca CAATGCTATA AGATCaCCTA tccgaaacca ttgtactcca ctagcgtacc ggcgaactac tccgatccac 
6U1 AGTTCGCTCT AGCTCTCTGT AACAACTATC TGCATGAGAA CTATCCGACA GTAGCATCTT ATCAGATTAC TGACGAGTAC GATGCTTACT TGCATATGGT ACACGGGACA GTCGCCTCCC 
6241 TGCATACTGC AACCTTCTGC CCCGCTAAGC TTAGAAGTTA CCCGAAAAAA CATGAGTATA GAGCCCCCAA TATCCGCAGT CCGGTTCCAT CAGCGATGCA GAACACGCTA CAAAATGTGC 
6361 TCATTGCCGC AACTAAAAGA AATTGCAACG TCACGCACAT GCGTGAACTG CCAACACTGG ACTCAGCGAC ATTCAATGTC C AATC X.MIL GAAAATATGC ATGTAATGAC GAGTAITCCC 

6*81 AGGAGTTCCC TCGGAAGCCA ATTACGATTA ccactcagtt tctcacccca tatctaccta gactgaaagg ccctaaggcc gccgcactat ttgcaaagac GTATAATTTG gtcccattcc 
6601 AAGAAGTGCC TATGGATAGA TTCCTCATCG ACATCAAAAG AGACCTCAAA gttacaccag GCACGAAACA cacagaagaa agacccaaag TACAAGTGAT ACAAGCCGCA gaacccctgg 
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6731 CGA CTCCTTA CTTATOCGCC ATTCACCCCO AATTAGTGCG TACOCTTACO CCCOTCTTCC TTCCAAACAT TCACACOCTT TTTGACATGT COGCCO AC QA TTTTGATCCA ATCATAOCAC 
6641 AACACTTCAA GCAACCCGAC CCCCTA CIGO ACACCCATAT CCCATCATTC QACAAAACCC AA C A CO A C CC TATGCCCTTA ACCGGTCTCA TCATCTTCGA GGA C CT GGOT GTGGATCAAC 

6961 CACTACTCCA cttgatcgag tgcccctttg cacaaatatc at cc ac cc at ctacctaccc ctactccttt taaai iccgs gcgatgatga aatccccaat cttcctcaca ctttttctca 

7061 ACACAGTTTT GAATCTCCTT ATCOCCACCA CACTACTAGA ACAOCOCCn AAAACCTCCA CATCTCCACC GTTCATTCCC CACGACAACA TCATACATCO AGTAGTATCT CACAAACAAA 
7201 TCCCTCACAO C r CCCCC A CC TCGCTCAACA TOOACGTTAA CATCATCOAC GCAGTCATCO CTCACACACC ACCTTACTTC T COCCOCCAT TTATCTTCCA AOATTCCCTT ACTTCCACAG 
732! CGTCCCGCOT CCCOCACCCC CTCAAAACGC TCTTTAACTT GGGTAAACCG CTCCCACCCO ArC iA C GAfiTA AGACOAACAC ACAAOA CCC O CTCTCCTACA TCAAA C AAAO CCCTOCTTTA 
7441 GAOTAGGTAT AACAGGCACT TTAGCAGTGG CCGTCA06AC CCGGTATGAO GTAGACAATA TTACACCTGT CCTACTGGCA TTGAGAAC7T ITCCCCAGAG CAAAAGACCA TTCCAAGCCA 
7361 TCAGAGGGGA AATAAAGCAT ctctacggto gtcctaaata ctcagcatag tacatttcat ctcactaata ctacaacacc ACCACCATGA atagaggatt CTTTAACATG ctpggccgcc 
76it gccccttccc ggcccccact gccatotcga ggccgcggag aagoaggcag gcggccccga roccToecco CAACGGGCTG GCTTCTCAAA TCCAGCAACT GACCACAGCC gtcagtcccc 
78)1 TAOTCATTGG acaggcaact AGACCTCAAC ccccacgtcc acgcccgcca ccgcgccaga a g aagcagg c gcccaagcaa ccac c gaagc c gaaoaaac c AAAAACCCAO C A GAAGAAGA 

7BZ1 AGAAGCAACC TGCAAAACCC AAACCCGGAA AGAGACAC CG CATGGCACTT AAGTTGGAGG CCG AC AGATT CTTCGACGTC AAGAA C GAGG ACGGAGATOT CATCGGGCAC GCACTGGCCA 
804! TGGAAGGAAA GGTAATGAAA CCTCtGCA CO TGAAAGGAAC CATCGACCAC CCTGTGCTAT CAAAGCTCAA ATTTACCAAG TTJCTCAGCAT ACCACATGCA GTTCGCACAG TTGCCAGTCA 
1161 ACATGAGAAG TCAGGCATTC ACCTACACCA GTCAACACCC CGAAGGATTC TATAACTGCC ACC AC GGAGC GGTGCAGTAT AGTGCAGGTA GATTTACCAT CCCTCGCGGA GTAGGAGGCA 
fill OAGOAGACAG CGGTCGTCCC ATCATCGATA ACTCCGGTCG GGTTGTCGCO ATAGTCCTCG GTGGAGCTCA TGAAGOAACA CGAACTGCCC TTTCGGTCGT CACCTGGAAT AGTAAAGGGA 
6401 AOACAATTAA OAfflAfCfffl GAAGGGACAG AAGAGTGCTC CGCAGCACCA CTCGTCACGG CAATGTG7TT GCTCGGAAAT GTGAGCTTCC CATGCGACCG CCCGCCCACA TGCTATACCC 

mi gcgaaccttc cag mjMK gacatcctto AAGAGAACGT GAACCATGAG GCCTACGATA CCCTGCTC AA TCCCATA7TO CGGTCCGGAT cgtctggcao aagcaaaaga agcgtcactg 
6641 ACGACTTTAC CCTGACCAGC CCCTACTTGG GCACATGCTC gtactgccac CATACTOAAC cgtocttcao ccctgttaag atcgagcagg tctgggacga agcggac c at aacaccatac 

1761 GCATACAGAC TTCCGCCCAG TTTCGATACG ACCAAAGCGG AGCAGCAAGC GCAAACAAGT ACCGCTACAT GTCGCTTCAG CAGGATCACA CCGTTAAAGA AGGCACCATO QATGACATCA 
Ittl AGATTAGCAC CTCAGGACCG TGTAGAAGGC TTACCTACAA AGGATACTTT CTCCTCGCAA AATSCCCTCC AGGGGACAGC OTAACGGTTA GCATACTCAG TAGCAACTCA GCAACGTCAT 
9001 OTACACTGGC CCGCAAGATA AAACCAAAAT TCG7GGGACG GGAAAAATAT GATCTACCTC CCGTTCACGG TAAAAAAATT CCTTGCACAG TGTACOACCO TCTGAAAGAA ACAACIGCAG 
9121 GCTACATCAC TATGCACAGG CCGGGACCGC ACGCTTATAC ATCCTACCTG GAAGAATCAT CAGGGAAAGT TTACGCAAAG CCGCCATCTG GGAAOAACAT TACGTATGAG TGCAAGTCCG 
9241 GCGACTACAA gaccggaacc gtttcg accc gcaccgaaat cactcgttgc ACCGCCATCA ACCAGT GC GT CGCCTATAAG AGCGACCAAA cgaagtgggt CTTCAACTCA CCGGACTTCA 
9361 TCAGACATGA CGACCACACG GCCCAAGGGA AATTGCATTT GCCTTTCAAG TTCATCCCGA GTACCTGCAT GGTCCCTOTT GCCCACGCGC CGAATGTAAT ACAJOU. Ml AAACACATCA 
941 OCCTCCAATT AGATACAGAC CACTTCACAT TGCTCACCAC CAGGAGACTA GGGGCAAACC CGGAACCAAC CACTGAATCG ATCGTCGGAA AGACGGTCAG AAACTTCACC GTCGACCGAG 
9601 ATGGCCTGGA ATACATATGG GGAAATCATG AGCCACTGAG GGTCTATGCC CAAGAGTCAG CACCAGGAGA CCCTCACGGA TCGCCACACG AAATAGTACA GCATTACTAC CATCCCCATC 
9721 CT07GTACAC CATCTTAGCC CTCGCATCAG CTACCGTCGC GATCATOATT GGCGTAACCG TTGCAGTCTT ATGTCCCTGT AAAGCGCGCC G7GAGTGCCT GACGCCATAC GCCCTCGCCC 
9641 CAAACGCCGT AATCCCAACT TCGCTCGCAC rCTTCTGCTG CGTTAGGTCO GCCAATSCTG AAACGTTCAC C GA G ACC AT G AGTTACTTCT GGTCGAACAG TCAGCCGTTC TTCTGGGTCC 
9961 AGTTG7PCAT A CCTTTGG CC CCTTTC ATCG TTCTAATGCG CTGCTGCTCC roCTOCCTGC CTTTnTAGT GGTTSCCGGC GCCTACCTGG CGAAGGTAGA CGCCTACGAA CATGCGACCA 
tOOM CTOTTCC AAA TGTGCCACAG ATACCGTATA AGGCACTTGT TCAAAGGGGA GGGTATCCCC CGCTCAATTT GGAGATCACT GTCATGTCCT CGGAGGTTTT OCCTTCCACC AACCAAGAGT 
10201 aCATTACCTG CAAATTCACC A CTOTCCTCC CCTCCCCAAA AATCAAATGC TG CGBCICCT TCGAATGTCA GCCGGCCGCT CATCCAGACT ATACCTGCAA GGTCTTCGGA GOGOTCTACC 
1 0321 CCTTTATGTG GGGAGGAGCG CAATGTTTTT GCGACAGTGA GAACACCCAO ATGAGTCAGG CGTACCTCGA ACTGTCAGCA GATTGCGCGT CTCACCACGC GCAGCCGATT AAGGTGCACA 
10441 CTGCCGCGAT GAAAGTACGA CTGCGTATAC TGTACGGGAA CACTACCAGT TTCCTAGATO TGTACOTGAA CGGAGTCACA CCAGGAACGT CTAAAGACTT GAAACTCATA GCTGCACCAA 
10561 TTTCAGCATC OTTTACGCCA TTCGATCATA AGGTCCTTAT CCATCGCGGC CTGG T GT ACA ACTATOACTT CCCGGAATAT GGAGCOATGA AACCAGGAGC CTTTGOACAC ATTGAAGCTA 
10611 CC TC CTT GA C tagcaaggat ctcatcccca GCACAGACAT TAGGCTACTC AAGC CTTCCC ccaagaacgt gcatgtcccg tacacgcago ccgcatcagg atttcagatg tgoaaaaaca 

10101 ACTCAGGCCG CCCACTGCAG GAAACCGCAC CTTTCGGCTO TAAGATTGCA GTAAATCCGC 7XCGAGCGGT GGACTGTTCA TACGGGAACA TTCCCATTTC TA7TGACATC CCGAA CGCT D 
10901 CCTTTATCAC GACATGAGAT GCACCACTCG TCTCAACAGT CAAATGTGAA GTCAGTGAGT GCACTTATTC AGCAGACTTX GGCGCGATGG CCACCCTCCA GTATGTATCC GACCGCGAAG 
11041 GTCAATGCCC CGTACATPCO CATTCGAGCA CAGCAACTCT CCAAGAGTCG ACAGTACATG TCCTGGAGAA AGGAGCGGTO ACAGTACACT TTAGCACCGC GAGTCCACAO GCGAACTTTA 
11161 TCGTATCCCT CTGTCGGAAG AAGACAACAT GCAATGCAGA ATGTAAACCA CCAGCTGACC ATATCCTGAG CACCCCGCAC AAAAATOACC AAGAATTTCA AGCCGCCATC TCAAAAACAT 
1 1281 CATGGAGTTG G C7UTTTG CC CTTTTCGGCG GCGCCTCCTC GCTATTAATT ATAOGACTTA 7GATTTTTGC TTCCAGCATG ATGCTCACTA GCACACGAAG ATGACCGCTA CGCCCCAATG 
11401 ATCCGACCAG CAAAACTCGA TCTACTTCCG AGGAACTGAT GTGCATAATG CATCACGCTC GTACATTAGA TCCCCCCTTA CCCCGCGCAA TATAGCAACA CTAAAAACTC GATGTACTTC 
11321 CGAGGAAGCG CAGTGCATAA TCCTGCGCAG TGTTGCCACA TAACCACTAT ATTAACCATT TATCTAGCGG ACGCCAAAAA CTCAATGTAT TTCTGAGGAA GCGTGGTCCA TAATGCCACG 
11641 CAGCCTCTCC ATAACTTTTA TTATTTCTTT TATTAATCAA CAAAATTTTG TTTTTAACAT TTC 
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